Computational protein design and protein structure prediction win Nobel Prize in Chemistry
2024 Nobel Prize for Chemistry acknowledges advancements in protein science
David Baker, Demis Hassabis and John Jumper have been awarded the 2024 Nobel Prize for Chemistry. One half of the prize has been awarded to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”
Protein design
The remarkable work by David Baker and colleagues in computational protein design, which recently has been revolutionised by Al, has made it possible to custom-build protein according to a researchers specifications, which shows how far this understanding has reached.
This is opening whole new avenues in fundamental and biomedical research. David Baker has made fundamental contributions to protein structure prediction and design over the last three decades, and has engaged the community through the development of numerous algorithms, tools and techniques.
Protein structure prediction
AlphaFold is a revolutionary artificial intelligence (AI) system that can accurately predict the structure of proteins.
“AlphaFold is the first AI system to send such ripples throughout the life sciences,” said Edith Heard, EMBL Director General. “This demonstrates the potential of AI as a tool for biology and a way to unlock new insights to address global challenges such as infectious disease, climate change, and food security.”
EMBL partnered with Google DeepMind to make the AlphaFold 2 predictions freely and openly available to all, through the AlphaFold Protein Structure Database.
The team from EMBL’s European Bioinformatics Institute (EMBL-EBI) have integrated the structure predictions into the existing life sciences data infrastructure, storing, indexing, integrating, and displaying them to ensure that AlphaFold delivers on its potential impact for the life sciences community.
“Public data were essential to the development of AlphaFold,” said John Jumper, Director at Google DeepMind. “The careful curation of such large data resources, representing the collective output of an entire subfield of biology, is exactly what enables our machine learning models to generalise well across such a huge range of proteins, enabling further breakthroughs in machine learning in other scientific areas.”
The AlphaFold Database is a powerful example of the virtuous cycle of open data. AlphaFold was trained on data that structural biologists have shared over many decades using data resources such as the ones managed by EMBL-EBI.
EMBL itself is a leader in structural biology, and enables access to experimental methods such as macromolecular crystallography, cryo-electron microscopy (cryo-EM), and cryo-electron tomography for researchers across the world. EMBL’s beamlines at Hamburg and Grenoble and cryo-EM facilities in Heidelberg have been helping to provide the ground data for decades – the experimental structures that constituted the high quality learning dataset used to train AlphaFold. Now, the AlphaFold 2 predictions and database inform a new era of structural biology and life sciences research.
“Huge congratulations to the team at Google DeepMind for this fantastic honour,” said Ewan Birney, Deputy Director General of EMBL. “The future is clearly bright for AI in molecular biology, and I’m certain we’ll see many more research questions answered by leveraging AI and open access to large amounts of high quality curated data.”
“The long lasting impact of AlphaFold will be defined by how the researchers around the world use its predictions to gain new insights into how life works,” said Sameer Velankar, Team Leader at EMBL-EBI who coordinated the AlphaFold Database project. “Since launch, the database has had over one million users from nearly every country. Thousands of scientific papers that mention AlphaFold have already been published. I’ve never seen anything quite like it.”
Why are protein structures important?
Proteins are tiny molecular machines that underpin all biological processes in all living things. Scientists have catalogued over 200 million highly curated proteins in the UniProt database. Each protein has a unique shape – also called a structure – which is closely linked to what the protein does – its function.
Knowing a protein’s structure offers clues about the protein’s role, how it is linked to health and disease, and what kind of chemical compounds or medicines it reacts to. For example, determining the structure of the SARS-CoV-2 viral proteins enabled scientists to understand how the virus operates, to identify treatments and develop new vaccines.
Over the last 60 years, biologists have managed to determine the structures of over 190,000 proteins using experimental methods such as X-ray crystallography or cryo-electron microscopy (cryo-EM). The AlphaFold system uses a deep learning algorithm to predict the structure of proteins, which means it can scale up very well.
The AlphaFold Database launched in July 2021, with just over 360,000 protein structure predictions, including all known human proteins, and has since grown to a staggering 200 million protein structures, from over one million organisms.