Read the latest Issue
Molecules to Ecosystems: Jo McEntyre on data services
EMBL-EBI Associate Director for Services Jo McEntyre talks about data services to support the new EMBL Programme, open data and a new era for research assessment.
With EMBL’s 2022-26 Programme ‘Molecules to Ecosystems’ and in line with EMBL’s Open Science policy, EMBL will increase data sharing, reuse, and coordination of bioinformatics resources and tools across molecular biology, particularly for bioimaging, genomic medicine, and multi-omics data resources, and develop biology-specific data portals with partners from around the world.
As the Associate Director of EMBL-EBI Services, Jo McEntyre oversees the institute’s 40+ open data resources and the technical infrastructure that underpins them. Millions of researchers around the world rely on these data services, or resources, daily for their work – to access public data and to share the data they have generated. This creates a virtuous circle of data production, sharing, and reuse, which drives new discoveries.
We spoke to McEntyre about how EMBL-EBI data services will support the implementation of EMBL’s new Programme, how EMBL-EBI is managing an increasing amount of data, and why open science is essential if we want research to be more inclusive and of higher quality.
As a result of the Programme, what do you expect to be doing differently at EMBL in 12 months?
EMBL-EBI’s services mission is to serve data to the world to enable research. This will not change, but as part of the new EMBL programme, we will work more closely with EMBL colleagues to develop new ways to gather research data into public data resources, for example, for bioimaging. We’ll use our data management expertise to develop new data-sharing portals for projects such as TREC, our expedition studying coastal ecosystems. I expect the Programme to contribute to informing how we develop our data resources for all users around the globe in the coming years.
Artificial intelligence (AI) will likely be a key driver of innovation in data science in the coming years. We are already seeing the high quality data that EMBL-EBI provides being used to train AI models, such as AlphaFold. On the flip side, we are also using AI technologies to make the data we manage more findable and to improve annotation.
In addition, I have been involved in the development of EMBL’s open science policy, which aims to ensure that all EMBL scientific outputs can be shared and reused as widely as possible, contributing to a positive open culture in life science research.
Similarly, EMBL’s commitment to the San Francisco Declaration on Research Assessment (DORA) and the Coalition for Advancing Research Assessment (CoARA) will shift the emphasis of research assessment from where research is published to what research is done. Essentially, this means moving away from assessing researchers by using impact factors as a proxy for scientific quality, and focusing on their contributions to science more broadly. This way we will not only be looking at the research articles they publish but also the data they produce and share, the software and tools they develop, the training they deliver, and any other activity that they consider impactful or important.
What do you find most exciting about the ‘Molecules to Ecosystems’ programme?
For me, it’s exploring how the life sciences engage with other disciplines and how we open up to new, broader collaborations and areas of research. Making sure open molecular data are available to new audiences – for example, environmental scientists or healthcare professionals – in the most useful way and observing how open molecular data interfaces with different data types to reveal insights, are two big upcoming opportunities for us. It’s going to be fascinating to bring our molecular view of the world to new contexts.
How important is collaboration to achieving your goals?
Collaborating in a variety of projects, especially with researchers working in the field, keeps us current as data management experts. Collaborative projects present us with new data types, new scientific questions, new analysis tools, and ensure we’re always aware of how the field is changing. This means we can anticipate and respond to user needs.
There are additional facets to collaboration for EMBL-EBI data services. Firstly, there would be no data resources without the scientists who deposit data in an open and FAIR (Findable, Accessible, Interoperable, Reusable) way so anyone can reuse them. This applies not only to the deposition databases that receive the data, but also the added-value databases – at EMBL-EBI and beyond – that curate and compute on these data to present them in more user-friendly ways to wider audiences.
Secondly, collaboration allows us to deliver our data resources. For example, we are part of the International Nucleotide Sequence Database Collaboration. This is the world’s largest joint effort to collect and disseminate databases containing DNA and RNA sequences. Many of our data resources are run in partnership with others. For example, the Protein Data Bank in Europe (PDBe), the home of 3D biomolecular structures, is part of the Worldwide Protein Data Bank (wwPDB), and Europe PMC, our literature database, collaborates with PMC at the National Library of Medicine in the USA.
Learn more about the new EMBL Molecules to Ecosystems Programme and the Data Services Plans.