Pathogens Portal: The new gateway to public pathogen data
Summary
EMBL-EBI’s new Pathogens Portal enables sharing and analysis of pathogen data from across the world
The Portal makes it easier for scientists, healthcare, and public health professionals to collaborate, enhancing pathogen surveillance worldwide
Being able to share pathogen data across borders is crucial, especially during public health outbreaks and pandemics
EMBL’s European Bioinformatics Institute (EMBL-EBI) has launched the Pathogens Portal – an online platform that enables researchers, clinicians, and policymakers to access the most comprehensive collection of biomolecular data about pathogens. The portal features data spanning over 200,000 pathogen species and strains and is set to become a key tool for infection biology and pathogen surveillance.
The list of pathogens featured in the portal was collated using the UK’s Health and Safety Executive’s list of approved biological agents and the WHO’s global priority pathogens list. It includes well-known pathogens that affect humans, including HIV, influenza, Hepatitis B, and the malaria parasite Plasmodium falciparum. It also covers lesser-known pathogens affecting humans, such as Lassa mammarenavirus, the cause of Lassa hemorrhagic fever, which can lead to deafness and even death in severe cases. The portal also contains hundreds of pathogens that affect other animals, which makes it a useful tool for food security and biodiversity.
The Pathogens Portal currently contains nucleotide sequences, raw genomic data, sample metadata, and relevant scientific literature. The intention is to integrate additional data types, including protein sequence and structure and chemistry data from other public data resources.
“The unique feature of the Pathogens Portal is that it brings together different data types, which are currently scattered in lots of different places,” explained Guy Cochrane, Team Leader at EMBL-EBI. “This new approach enables researchers, clinical scientists, and public health agencies to access all publicly-available data about their pathogen of interest with just one quick search. The portal also contains intuitive tools for discovery, which make it easy for users to refine their searches.”
“The Pathogens Portal is an important step in preparing for the next pandemic,” said Marion Koopmans Head of the Erasmus Medical Centre’s Department of Viroscience. “Pulling together multiple open biological data resources for a breadth of pathogens is a key knowledge base to ready ourselves for future pandemics.”
Pandemic preparedness
“The COVID-19 pandemic demonstrated that having robust and easy-to-use data sharing structures in place can save lives because these enable a quick and informed public health response,” explained Marianna Ventouratou, Data Platform Manager at EMBL-EBI. “Building on the lessons learned from COVID-19 pandemic, EMBL-EBI and partners have now developed the Pathogens Portal, which researchers and public health authorities around the world can use to enhance global pathogen surveillance efforts.”
Importantly, the data accessible through the Pathogens Portal is open and FAIR (Findable, Accessible, Interoperable, and Reusable), meaning it is available to anyone with an internet connection. This approach is particularly valuable during a public health emergency, when data sharing speed is of the essence.
“It is invaluable to have a data portal like the Pathogens Portal, which represents the pathogen world beyond viruses, and takes a much more holistic and flexible view of where the next threatening pathogen may come from,” explained Frank Møller Aarestrup, Head of Genomic Epidemiology at the Technical University of Denmark.
Private data and cohort data
There is also a key component, called the Data Hubs system, which allows researchers and health agencies to keep their data private in the first instance. This is operated from EMBL-EBI’s existing infrastructure, including the European Nucleotide Archive (ENA). This is an important functionality for countries and researchers who wish to keep their data private before publication, but still want to be able to analyse them alongside other public records available through the portal.
Another exciting feature of the portal is the cohort browser, which contains highly sought-after clinical-epidemiological data from patient cohorts. There is currently only one pilot study focusing on SARS-CoV-2 available in the browser, provided through the ReCoDID project by the Erasmus Medical Centre, with the help of the University Hospital Heidelberg. The Pathogen Portal team is actively encouraging researchers to submit more cohort data.
“The Cohort Browser interoperates genomic data with clinical epidemiological data, which enables deep interrogation of disease data by linking information on the pathogen and the host it directly infected,” said Lauren Maxwell, Group Leader at the Universitätsklinikum Heidelberg.
Building on success
The Pathogens Portal is built on the same framework as the European COVID-19 Data Portal, which EMBL-EBI and collaborators set up during the COVID-19 pandemic to support international data sharing essential for the pandemic response. Since launch, the COVID-19 Data Portal has been accessed by almost 300,000 users in 187 countries and geographical areas.
Already, three EMBL-EBI resources feed data into the Pathogens Portal, with more coming soon.
European Nucleotide Archive (ENA), which provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information, and functional annotation.
BioSamples, which stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry.
Europe PMC, which provides comprehensive access to over 40 million life sciences publications from trusted sources.
What is the difference between raw sequencing data and nucleotide sequences?
Raw reads are the raw data produced by genome sequencing machines – lots and lots of data fragments. Raw reads can be assembled into nucleotide sequences, and provide more depth of information about the sequenced organism or sample.
Analysing raw reads requires more bioinformatics knowledge, but also enables deeper analysis, whereas nucleotide sequences are more readily applied to downstream applications, which can benefit non-specialist users. Raw reads offer deeper questions and tailored analyses. Both data types are important and synergistic, enabling greater utility to a wider audience of users.
The Pathogens Portal is a community-driven initiative, and users are invited to submit feedback and questions to the project team on ena-path-collabs@ebi.ac.uk.
Supporting projects
The Pathogens Portal is supported by European Union funding through the RECODID (Horizon 2020 no. 825746), VEO (Horizon 2020 no. 874735) and BY-COVID (Horizon Europe no. 101046203) projects.
It builds on infrastructure developed and funded by ELIXIR-CONVERGE (Horizon 2020 no. 871075), EOSC-Life (Horizon 2020 no. 824087), COMPARE (Horizon 2020 no. 643476), CORBEL (Horizon 2020, no. 654248), and EMBL core funding.