Edit

Understanding rare diseases with digital twins

New EMBL-EBI project explores the use of a concept developed in aerospace engineering to support rare disease research, diagnosis, and treatment

Ellie McDonagh (left), Translational Informatics Director of Open Targets and Rahuman Sheriff (right), Senior Project Leader at EMBL-EBI. Credit: Background image EMBL, photography: Jeff Dowling/EMBL-EBI.

Rare diseases are individually uncommon, but together they affect over 300 million people worldwide. There are more than 7,000 conditions that are classed as rare diseases, and many of them don’t have approved treatments. Because each rare disease affects a relatively small number of people, diagnosis is difficult and the data are sparse. 

EMBL-EBI scientists are trying to get a greater understanding of the biological mechanisms driving rare diseases by using artificial intelligence (AI) and a concept used in aerospace engineering – digital twins. The project is funded by the Chan Zuckerberg Initiative and set to run over the course of three years. 

Rahuman Sheriff, Senior Project Leader at EMBL-EBI and Ellie McDonagh, Translational Informatics Director of Open Targets, are leading this initiative, with support from the Petsalaki research group at EMBL-EBI. 

Below, McDonagh and Sheriff explain their objectives, challenges, and next steps. 

What are digital twins?

Rahuman Sheriff (RS):  Digital twins are virtual models of real-world systems that update with live data. The concept first gained traction in industries like aerospace and automotive engineering, where manufacturers used them to monitor and improve complex machines. For example, using the digital twin of an aircraft engine, engineers can run computer simulations to test different conditions and diagnose issues to prevent failures and improve safety. Digital twins are also useful for making predictions or decisions – for example, whether to keep flying an aircraft or to ground it. 

Our project aims to apply the ‘digital twins’ concept to medicine by testing if it’s possible to build digital twins for patients. Developing digital twins of patients will require integration of several cutting-edge approaches from machine learning, advanced mathematical modelling, and bioinformatic analysis of multiomics datasets from patients. 

How could digital twins be used in healthcare?

Ellie McDonagh (EM): Creating a digital twin of a patient could help researchers gain a greater understanding of disease mechanisms and simulate disease trajectories and patient response to therapeutics. 

Our project focuses on developing digital twins for tissues, with the purpose of understanding the differences between healthy and diseased states. We will be looking specifically at applications for rare disease research, but the models and tools that we develop should also be useful for understanding other diseases, such as cancer.

Why rare diseases?

RS:  Rare diseases can be challenging for doctors to diagnose and treat and for scientists to investigate, because each rare disease affects a small number of people. In some cases, there may only be a few known patients in a country, which makes it hard to gather enough data and develop effective treatments.

Because of the lack of data, these conditions aren’t well studied by the pharmaceutical industry. The data are often patchy, meaning that we don’t have all the data types for a patient. Our project aims to use generative AI tools to fill data gaps where possible.  

For all these reasons, we need a different approach when studying rare diseases,  and digital twins might be a useful tool. 

What are the limitations of digital twins?

EM: Unlike engineering devices, digital twins of patients are difficult to build due to the complexity of biological systems and incomplete understanding of disease conditions. 

Also, digital twins provide computational predictions that require validation in the real world. We hope that such models can unveil new insights and research avenues into the biology of disease, but these insights would always need to be verified through biological and clinical tests. 

How are you going to build the rare disease digital twins?

RS: The first step is data collection, harmonised processing, and curation to ensure we have high-quality, standardised datasets. Based on this, we will begin by modelling one or two tissue types digitally.

Next, we will develop healthy digital twins for these tissues, establishing a baseline for comparison against disease states. From there, we will expand to modelling common diseases, leveraging data from larger patient cohorts to refine and validate our approach.

To build the digital twins for healthy and common disease tissues, we’re planning to use multi-omics data from public databases like the ones managed by EMBL-EBI – for example, Expression Atlas, BioStudies, DECIPHER, and PRIDE. We also hope to use controlled-access data such as those made available through the European Genome-phenome Archive, co-managed by EMBL-EBI and the Centre for Genomic Regulation in Barcelona. To enable more data access, we’re hoping to develop collaborations with rare disease consortia and patient advocacy groups.

We will run comparisons of common diseases against healthy tissues to train machine learning models that can predict dysfunction. Finally, we will use data from rare disease patients to develop the rare disease digital twins. To ensure the quality of the digital twins, during each of these steps, we will check that what the machine learning models tell us corresponds to our existing knowledge of mechanisms underlying disease.

What does success look like for your project?

EM: This is unchartered territory, so to some extent, success is creating a proof of concept – showing that these digital twins are feasible as well as building useful datasets and models for the community. 

Crucially, we want our work to be openly available for anyone to reuse and improve upon. We aim to create a public collection of mechanistic models and machine learning models that will be accessible in the BioModels public repository. We hope that the models will be reused and adapted beyond the scope of this project, by anyone interested in exploring this technology in research and in the clinic. 

What are the next steps?

RS: We’re looking to partner up with organisations working in the rare disease space to ensure the outcomes of this project are useful to the community. If you’re able to help, please get in touch, we would love to hear from you.  

EM: We’re also currently recruiting a multi-omics data scientist (applications close on 12 March 2025) and a machine learning scientist (applications close on 16 March 2025). We encourage enthusiastic people who are keen to get involved in our project to apply! Reach out for any questions on these roles. More roles will be advertised in the coming months. 

About Chan Zuckerberg Initiative

The Chan Zuckerberg Initiative was founded in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education, to addressing the needs of our communities. Through collaboration, providing resources and building technology, our mission is to help build a more inclusive, just and healthy future for everyone. For more information, please visit chanzuckerberg.com.


Tags: bioinformatics, data resources, decipher, ega, embl-ebi, gene expression, genomics, pride, rare disease

News archive

E-newsletter archive

EMBLetc archive

News archive

For press

Contact the Press Office
Edit