Polygenic Score (PGS) Catalog increases diversity and usability of genetic data
Addition of data from more diverse populations to the Polygenic Score (PGS) Catalog and a new software tool for PGS calculation could help produce more equitable disease risk predictions
Summary
New data have been added in the Polygenic Score (PGS) Catalog from non-European populations, in an important step towards ensuring that polygenic scores are equitable across populations.
A new PGS Catalog Calculator simplifies the calculation of polygenic scores, making it easier to calculate scores in different computing environments and across different ancestries.
These updates help to generate more equitable disease risk predictions for populations from a more diverse range of genetic backgrounds.
Researchers have updated the Polygenic Score Catalog, adding data from multi-ancestry or non-European populations and introducing the PGS Catalog calculator. These updates, described in a Nature Genetics paper, could help to increase the accessibility and equity of genetic disease risk predictions.
The PGS Catalog is the largest open database for polygenic scores with ~27,000 users from over 140 countries in the past year alone. These scores estimate an individual’s genetic predisposition to a specific trait or disease by summarising the effect of many different genetic variants across the genome.
Polygenic scores are particularly useful for predicting complex health conditions such as heart disease, diabetes, and certain cancers, where multiple genetic variants contribute to the overall risk. Integrating these scores into clinical practice could help scientists and clinicians understand genetic influences on health, potentially leading to better prevention strategies and tailored treatments.
The PGS Catalog was created to standardise the way these scores are reported and to make them more reliable for clinical applications. The project is a collaborative effort between EMBL’s European Bioinformatics Institute (EMBL-EBI), the University of Cambridge, the GWAS Catalog, and colleagues.
Increasing ancestral diversity
The field of genetics increasingly recognises the importance of diversifying genomic datasets, demonstrated by initiatives like the Human Pangenome Project. Since its inception in 2021, the PGS Catalog has grown to host over 4,735 polygenic risk scores, representing a 721% increase. Much of this increase also expands the ancestral diversity of the Catalog’s data.
Due to the lack of genetic data from populations of non-European ancestry, data in early releases of PGS Catalog mostly consisted of scores using data from individuals of European ancestry. Now more polygenic scores have been added from studies using African, Asian, and often multi-ancestry data.
“Expanding the ancestral diversity of the PGS Catalog is a first step forward in ensuring that genetic research benefits everyone, regardless of their background,” said Helen Parkinson, Team Leader and Senior Scientist at EMBL-EBI. “By making these tools and data more accessible and representative, we aim to improve the accuracy of genetic predictions and promote equity in how these data are applied in healthcare and research worldwide.”
The PGS Catalog Calculator
The PGS Catalog Calculator is a new addition to the PGS Catalog. This open source software tool automates the process of calculating polygenic scores, allowing users to apply them to new genomic data, by simplifying tasks such as genotype data formatting and variant matching. The Calculator also implements methods for genetic similarity analysis and ancestry adjustment, an important step towards ensuring that calculated polygenic scores are more interpretable across populations. This could help to streamline the use of polygenic scores in research and clinical studies.
“Already, the Calculator has been used by researchers worldwide and deployed in multiple biobanks and trusted research environments,” said Ben Wingfield, Senior Bioinformatician at EMBL-EBI. “This includes its integration with the INTERVENE project, which leverages AI to advance personalised medicine through genetic score reporting.”
“Our goal with the PGS Catalog and the new Calculator is to lower the barrier to entry for using polygenic scores,” said Sam Lambert, Assistant Professor of Health Data Science at University of Cambridge and Visiting Researcher at EMBL-EBI. “By providing an open-source, user-friendly tool that handles the complexities of genetic data, we’re making it possible for researchers and clinicians to apply these predictors of genetic risk across diverse populations. This is an important step towards ensuring that the benefits of genetic research are shared equitably, regardless of ancestry or background.”
Work on the PGS Catalog & Calculator has been supported by multiple funders, including: National Human Genome Research Institute of the National Institutes of Health grant, European Union’s Horizon 2020 research and innovation programme as well as by Health Data Research UK and core funds from EMBL-EBI.
Human Ecosystems at EMBL
This work was carried out as part of EMBL’s Human Ecosystems Transversal Theme, part of the Molecules to Ecosystems Programme. The Human Ecosystems theme combines molecular biology expertise with genomics, epidemiology, toxicology, and clinical research to tackle some of the most pressing challenges in human health. This interdisciplinary approach helps pave the way for a deeper understanding of how genotype-environment interactions shape human health outcomes.