Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease

Yu Xu*, Dragana Vuckovic, Scott C. Ritchie, Parsa Akbari, Tao Jiang, Jason Grealey, Adam S. Butterworth, Willem H. Ouwehand, David J. Roberts, Emanuele Di Angelantonio, John Danesh, Nicole Soranzo, Michael Inouye*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Genetic association studies for blood cell traits, which are key indicators of health and immune function, have identified several hundred associations and defined a complex polygenic architecture. Polygenic scores (PGSs) for blood cell traits have potential clinical utility in disease risk prediction and prevention, but designing PGS remains challenging and the optimal methods are unclear. To address this, we evaluated the relative performance of 6 methods to develop PGS for 26 blood cell traits, including a standard method of pruning and thresholding (P + T) and 5 learning methods: LDpred2, elastic net (EN), Bayesian ridge (BR), multilayer perceptron (MLP) and convolutional neural network (CNN). We evaluated these optimized PGSs on blood cell trait data from UK Biobank and INTERVAL. We find that PGSs designed using common machine learning methods EN and BR show improved prediction of blood cell traits and consistently outperform other methods. Our analyses suggest EN/BR as the top choices for PGS construction, showing improved performance for 25 blood cell traits in the external validation, with correlations with the directly measured traits increasing by 10%–23%. Ten PGSs showed significant statistical interaction with sex, and sex-specific PGS stratification showed that all of them had substantial variation in the trajectories of blood cell traits with age. Genetic correlations between the PGSs for blood cell traits and common human diseases identified well-known as well as new associations. We develop machine learning-optimized PGS for blood cell traits, demonstrate their relationships with sex, age, and disease, and make these publicly available as a resource.

Original languageEnglish
Article number100086
JournalCell Genomics
Volume2
Issue number1
DOIs
Publication statusPublished - 12 Jan 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021 The Author(s)

Keywords

  • Blood cell trait
  • Disease assocations
  • Machine learning
  • Method
  • Polygenic score
  • Population stratification

Fingerprint

Dive into the research topics of 'Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease'. Together they form a unique fingerprint.

Cite this