The Wellcome Sanger Institute has one of the largest DNA sequencing facilities in the world and in 2021, the Sequencing Centre outputted almost 40,000 bn DNA bases a day (the human genome is approximately 3bn bases long). Also in 2021, our researchers read the equivalent of one gold-standard (30x) human genome every 3,2 minutes. Last year, at our Sequencing Centre, we read the genomes of 1,551 species.
Thanks to the latest Illumina hardware and bespoke software that was developed in-house, this is one of the most accurate and efficient sequencing facilities in the world.
Big data processing and analysis: EMBL-EBI
EMBL-EBI makes open access biological research datasets available. These are used extensively across the world by more than two million researchers in academia and industry. Some 107 million requests for data are made on a daily basis to EMBL-EBI’s websites. Analysing big data has become a bottleneck for life-science research and EMBL-EBI provides facilities to enable this work.
EMBL-EBI’s open data resources include:
- AlphaFold Protein Structure Database – Protein structure predictions for most known proteins
- ChEMBL – Bioactive molecules with drug-like properties
- Ensembl – Vertebrate and plant genome browsers
- Expression Atlas – Gene expression across species and biological conditions
- UniProt – Universal resource for protein sequence and function information
- Protein Data Bank in Europe (PDBe) – 3D protein structures determined experimentally
The Embassy Cloud provides private, secure, virtual machine-based workspaces within the EMBL-EBI infrastructure, in which clients can make optimal use of their own customised workflows, applications and datasets.
Embassy Cloud partners have access to EMBL-EBI data, services and compute resources, providing a practical and cost-effective alternative to replicating services and downloading vast public datasets locally. The Cloud’s partner companies can access their workspace from anywhere in the world, reducing the need for capital investments in hardware and related operational costs.
Big data processing and analysis: Wellcome Sanger Institute
The data output from the Wellcome Sanger Institute is increasing all the time and the Institute has developed new technologies for storing and accessing the data. The iRODS (Integrated Rule-Orientated Data System) is a tool that is accessible to all for the management and distribution of sequence data.
The Institute has also developed more efficient data-storage formats that, like all the Institute’s software tools, are made available to the research community on an open-access basis.
DNA bases are read by the Sequencing Centre every day
combined petabytes of storage between Wellcome Sanger Institute and EMBL-EBI
species were sequenced in 2021
total number of compute cores in the Data Centre