Read the latest Issue
Welcome: Mallory Freeberg
The new Team Leader for Human Genomics at EMBL-EBI shares how curiosity and maths shaped her career and what it takes to be a community builder
Genetically, humans are 99% identical, but it’s the final 1% that accounts for differences in the way we look, our susceptibility to diseases, and our response to treatments. To better understand genomic variation, researchers use a variety of tools, including the Ensembl genome browser managed by EMBL’s European Bioinformatics Institute (EMBL-EBI). Ensembl is a reference database that enables clinicians to interpret genomic sequencing results from patients worldwide.
Mallory Freeberg is the new Human Genomics Team Leader, supporting the delivery of open data resources such as Ensembl, which exist to enable a better understanding of human health and disease.
When did you first become interested in science?
I’m the only scientist in my family, but when I was in school, I really enjoyed maths. For a while, I wanted to become an actuarial scientist and use statistics to assess risk in insurance, investments etc. I took so many maths courses that the high school didn’t have anything left to offer in my final year. That’s also when I took an advanced biology course and realised that I could use maths to understand complex biological questions. This played really well into the natural curiosity that I have for the world around us.
Tell me a little bit about your background.
When I went to university, there were very few courses for bioinformatics, at least in the United States. In fact, I was the only bioinformatics major for the first three years of my degree. I then did a PhD in bioinformatics at the University of Michigan and joined a genetics lab that was studying mechanisms of post-transcriptional gene regulation in the nematode worm C. elegans.
Next, I did a postdoc at Johns Hopkins University, looking at other organisms like budding yeast, and trialling Oxford Nanopore’s first direct RNA sequencing kits to understand transcriptomes. More importantly, my advisor was James Taylor, one of the founders of the open source GALAXY platform, an online resource to enable non-computational people to do computational bioinformatics. Around this time, I joined the Galaxy Training Network, and I really enjoyed developing training materials and running workshops. This is when I realised I like working in science as a service, versus academic research.
How did you come to work for EMBL-EBI?
I saw a job advertised at EMBL-EBI for a metadata specialist in the Human Cell Atlas project. Having just returned from my honeymoon in Europe at the time, I liked the idea of moving there for a few years.
After some time, I moved to the team that manages the European Genome-phenome Archive (EGA), where I worked on a project to ingest 20 petabytes of data from the UK Biobank. For the past few years, I’ve been a Coordinator for the EGA, and I really enjoyed helping others use the resource. When this new role came up, it seemed like a natural fit.
What does your role as Human Genomics Team Leader entail?
The first task is to understand all the great work that the teams in this group are already doing, and how I can enable them to keep developing. Some of the teams I oversee are part of the wider Ensembl project, producing resources for understanding genome variation and regulation.
Other projects in my team are newer to EMBL-EBI. For example, DECIPHER, which turns 20 this year, moved to EMBL-EBI last year. DECIPHER is an important resource for studying rare diseases. It enables clinicians to interpret genomic variants to enhance clinical diagnosis.
There are also a few popular tools managed by the team, including Ensembl’s Variant Effect Predictor, which determines the effects of genetic variants on genes, transcripts, protein sequence, and regulatory regions.
These teams are already working well, so I see myself not as a disruptor, but as a connector. We have fabulous curators, software engineers, and project leads, and they’re the experts!
I’m also keen to strengthen the links these resources have to other projects, such as the Human Pangenome Reference Consortium, the Global Alliance for Genomics and Health (GA4GH), ELIXIR, and initiatives to federate human data access across Europe and the world. I think it’s important to take the next steps in these areas and bring clinicians into the sphere of users we train and serve. I’m also bringing my experience in dealing with sensitive human data, which I gained during my time at the EGA.
Why does the work of the Human Genomics Team matter to science and more broadly to society?
There is so much human data being generated in the world and work being done to generate knowledge that unless there’s a place to centralise the data, it’s very challenging for people to discover, access, and use them. Ensembl in particular is providing a way to access all kinds of functional data and predictive data generated by AI models. EMBL-EBI experts gather and curate data so only the highest quality information is displayed. This simplifies the lives of researchers no matter what they’re studying.
But as data volumes explode, we also have to find more nuanced ways of storing and sharing, especially since much of the human data is sensitive, and comes with a lot of legal restrictions and privacy challenges. Much of this information is siloed, especially in clinical settings. We’re only now starting to disentangle when data needs to be siloed and when it doesn’t. There are many new techniques and approaches coming out that enable the data to be shared and analysed in a safe way. The DECIPHER project is a great example because it allows researchers and clinicians to put their data in and manage access. This is particularly important in rare disease research, where a clinician may only see one person with a particular condition, so getting access to data from other patients around the world is crucial.
More widely, the data and knowledge held in EMBL-EBI’s data resources are used in different ways in clinical settings, for example, to interpret genetic tests provided through healthcare systems, to guide patient treatment, and to help families of rare disease patients understand and manage their conditions. EMBL-EBI data resources are foundational for genomic and precision medicine.
Are there any projects you’re particularly excited about?
We’ve recently started work on a collaborative project with Open Targets, the Wellcome Sanger Institute, and Human Technopole to build a perturbation catalogue. It will display data from functional assays where different aspects of human cell lines and single cells have been perturbed in some way. So it’s about introducing variations in genes or regulatory regions and seeing how it affects the function of the gene.
These datasets already exist, but they are spread out, and we want to bring them together so they are easy to search, compare and analyse. We will look at CRISPR screens, single-cell expression data, and also newer assay types such as Perturb-seq. We’ll also develop metadata standards to help researchers describe their experiments, making them easy to understand and reuse. This resource will be available in the Open Targets suite of products.
What do you think is the potential of the new generation of AI tools for the human genomics data ecosystem?
There is one application I find particularly interesting: using large language models to help with data curation. This means using AI to digest research papers to identify things like gene and phenotype links, or gene-disease links. This can be a time-consuming process and requires an expert to review the literature, but if we could get the AIs to do the tedious work more quickly, then human experts can double-check what the AI comes up with. There’s still a lot of uncertainty in this area because these models have been known to “hallucinate”, so the need for the human curator is still very much there, but it may be an interesting direction.
On EMBL’s 50th anniversary year, what are some of the things that you’re particularly excited about?
There are lots of cool things happening, but one thing I really like is the work done by the Human Ecosystems transversal theme on the exposome, looking at how environmental exposures affect our health and well-being. This can include things like chemicals or pollutants in your environment, what we eat, how we exercise etc. I was involved in developing a course on this topic which ran in February 2024. I’m curious if we can start to bring some of this exposure data into EMBL-EBI data resources, such as Ensembl. I don’t know if this is possible or what it looks like, but it’s an exciting prospect.
Who are your science heroes?
I don’t really have a science hero per se. I am actually inspired pretty regularly by the people around me. People I work with who come up with great ideas for solving complex challenges and demonstrate great resilience during periods of stress. People I collaborate with who share our scientific vision and are so enthusiastic about working together to achieve goals. People who show amazing courage in pursuing their professional dreams, securing fellowships and accepting leadership positions within their communities. People who believe in the power of open science to enable progress and aren’t afraid to challenge the status quo. So many people are doing truly amazing things that I can’t just pick one hero.
What hobbies and interests do you have?
I do powerlifting as a hobby, which means I try to lift as much weight as I can on three lifts: deadlift, squat, and bench press. I find it both meditative and empowering to go to a gym and be able to lift really heavy weights.