What is a pangenome?
A pangenome is a collection of many different genome sequences that capture the genetic diversity in a population. Pangenomes could potentially be produced for all species.
Experts in quantum computing and genomics to develop new methods and algorithms to process biological data
Quantum computing has the potential to overhaul how information is processed and to offer computational powers beyond current computing capabilities. In the life sciences, quantum computing could be useful for many applications, from drug discovery and protein 3D structure prediction to genome analysis and beyond.
Now, a new collaboration brings together a world-leading interdisciplinary team with skills across quantum computing, genomics, and advanced algorithms. They aim to tackle one of the most challenging computational problems in genomic science: building, augmenting, and analysing pangenomic datasets for large population samples. Their project sits at the frontiers of research in both biomedical science and quantum computing.
The project, which involves researchers based at the University of Cambridge, the Wellcome Sanger Institute and EMBL’s European Bioinformatics Institute (EMBL-EBI) has been awarded up to US $3.5 million to explore the potential of quantum computing for improvements in human health.
The team aims to develop quantum computing algorithms with the potential to speed up the production and analysis of pangenomes – new representations of DNA sequences that capture population diversity. Their methods will be designed to run on emerging quantum computers. The project is one of 12 selected worldwide for the Wellcome Leap Quantum for Bio (Q4Bio) Supported Challenge Program.
Since the initial sequencing of the human genome over 20 years ago, genomics has revolutionised science and medicine. Less than one per cent of the 6.4 billion letters of DNA code differs from one human to the next, but those genetic differences are what make us unique. Our genetic code can provide insights into our health, help to diagnose disease, or guide medical treatments.
However, the reference human genome sequence, which most subsequently sequenced human DNA is compared to, is based on data from only a few people and doesn’t represent human diversity. Scientists have been working to address this problem since the publication of the original human genome, and in 2023, the first human pangenome reference was produced.
A pangenome is a collection of many different genome sequences that capture the genetic diversity in a population. Pangenomes could potentially be produced for all species.
The human pangenome data are freely accessible on the Ensembl human pangenome project page and through Ensembl Rapid Release.
Pangenomics, a new domain of science, demands high levels of computational power. While the existing human reference genome structure is linear, pangenome data can be represented and analysed as a network, called a sequence graph. This graph stores the shared structure of genetic relationships between many genomes. Comparing subsequent individual genomes to the pangenome then involves matching sequences to map a route through the graph.
In this new project, the team aims to develop quantum computing approaches with the potential to speed up both key processes: mapping data to graph nodes and finding good routes through the graph.
Quantum technologies are poised to revolutionise high-performance computing. Classical computing stores information as bits, which are binary – with a value of either 0 or 1. However, a quantum computer works with particles that can be in a superposition of different states simultaneously. Rather than bits, information in a quantum computer is represented by qubits (quantum bits), which could take on the values 0 or 1, or be in a superposition state between 0 and 1. It takes advantage of quantum mechanics to enable solutions to problems that are not practical to solve using classical computers.
However, current quantum computer hardware is inherently sensitive to noise and decoherence, so scaling it up presents an immense technological challenge. While there have been exciting proof of concept experiments and demonstrations, today’s quantum computers remain limited in size and computational power, which restricts their practical application. But significant quantum hardware advances are expected to emerge in the next three to five years.
The Wellcome Leap Q4Bio Challenge is based on the premise that the early days of any new computational method will advance and benefit most from the co-development of applications, software, and hardware – allowing optimisations with not-yet-generalisable, early systems.
Building on state-of-the-art computational genomics methods, the team will develop, simulate, and then implement new quantum algorithms, using real data. The algorithms and methods will be tested and refined in existing, powerful High Performance Compute (HPC) environments initially, which will be used as simulations of the expected quantum computing hardware. The team will test algorithms first using small stretches of DNA sequence, working up to processing relatively small genome sequences like that of SARS-CoV-2, before moving to the much larger human genome.
The project is a first step in exploring and conceptualising what quantum computing could bring to pangenomics. Expressing such scientific questions using quantum computing frameworks could itself yield benefits and new insights for researchers, even if practical application to quantum computers is not feasible.
“On the one hand, we’re starting from scratch because we don’t even know yet how to represent a pangenome in a quantum computing environment,” explained David Yuan, Project Lead at EMBL-EBI. “If you compare it to the first moon landings, this project is the equivalent of designing a rocket and training the astronauts. On the other hand, we’ve got solid foundations, building on decades of systematically annotated genomic data generated by researchers worldwide and made available by EMBL-EBI. The fact that we’re using this knowledge to develop the next generation of tools for the life sciences is a testament to the importance of open data and collaborative science.”
EMBL-EBI is contributing data wrangling expertise to the project, as well as some of the technical infrastructure that will allow the project to run simulations of how quantum computing could work in the future, using existing technologies.
“Currently it’s routine for genome sequencing from an individual to be compared to the linear reference genome to call variants and predict impact on functional elements,” explained Sarah Hunt, Variation Resources Coordinator at EMBL-EBI, who is not involved in the project. “As more and more full individual genomes are sequenced we want to be able to analyse them against the human pangenome reference. Being able to map data as quickly and efficiently as possible is critical. The hope is that projects like Wellcome Leap Q4Bio will one day help us leverage the information held in the pangenome and translate it into better clinical outcomes.”
“We’ve only just scratched the surface of both quantum computing and pangenomics,” said David Holland, Principal Systems Administrator at the Wellcome Sanger Institute, who is working to create a High Performance Compute environment to simulate a quantum computer. “So to bring these two worlds together is incredibly exciting. We don’t know exactly what’s coming, but we expect that all of a sudden, the heights of what is possible will come so much closer. We are doing things today that we hope will make tomorrow better.”
The potential benefits of this work are huge. Comparing a specific human genome against the human pangenome – instead of the existing human reference genome – gives better insights into its unique composition. This will be important in driving forwards personalised medicine. Similar approaches for bacterial and viral genomes will underpin the tracking and management of pathogen outbreaks.
This article is based on a Wellcome Sanger Institute press release.
Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.
EMBLetc. archive