Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies.

Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies.

Publication date: Feb 11, 2020

Genetic risk prediction is an important problem in human genetics, and accurate prediction can facilitate disease prevention and treatment. Calculating polygenic risk score (PRS) has become widely used due to its simplicity and effectiveness, where only summary statistics from genome-wide association studies are needed in the standard method. Recently, several methods have been proposed to improve standard PRS by utilizing external information, such as linkage disequilibrium and functional annotations. In this paper, we introduce EB-PRS, a novel method that leverages information for effect sizes across all the markers to improve prediction accuracy. Compared to most existing genetic risk prediction methods, our method does not need to tune parameters nor external information. Real data applications on six diseases, including asthma, breast cancer, celiac disease, Crohn’s disease, Parkinson’s disease and type 2 diabetes show that EB-PRS achieved 307.1%, 42.8%, 25.5%, 3.1%, 74.3% and 49.6% relative improvements in terms of predictive r2 over standard PRS method with optimally tuned parameters. Besides, compared to LDpred that makes use of LD information, EB-PRS also achieved 37.9%, 33.6%, 8.6%, 36.2%, 40.6% and 10.8% relative improvements. We note that our method is not the first method leveraging effect size distributions. Here we first justify our method by presenting theoretical optimal property over existing methods in this class of methods, and substantiate our theoretical result with extensive simulation results. The R-package EBPRS that implements our method is available on CRAN.

Open Access PDF

Song, S., Jiang, W., Hou, L., and Zhao, H. Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies. 23953. 2020 PLoS Comput Biol (16):2.

Concepts Keywords
Asthma Diabetes
Breast Cancer Statistical genetics
Celiac Disease Simulation
CRAN
Diabetes
Genetic
Genetics
Genome
Linkage Disequilibrium
Parkinson
Polygenic
Simulation
Summary Statistics

Semantics

Type Source Name
disease MESH asthma
pathway KEGG Asthma
disease MESH breast cancer
pathway KEGG Breast cancer
disease MESH celiac disease
disease MESH type 2 diabetes
drug DRUGBANK Trestolone
disease MESH diagnosis
disease MESH healthy diets
disease MESH privacy
disease MESH tic
drug DRUGBANK Saquinavir
disease MESH P+T
drug DRUGBANK Proline
drug DRUGBANK Ranitidine
disease MESH schizophrenia
drug DRUGBANK Pentaerythritol tetranitrate
disease MESH Kidney Diseases
drug DRUGBANK L-Arginine
drug DRUGBANK Ilex paraguariensis leaf
drug DRUGBANK Coenzyme M
drug DRUGBANK Esomeprazole
drug DRUGBANK Aspartame
drug DRUGBANK Chlordiazepoxide
drug DRUGBANK L-Phenylalanine
disease MESH rheumatoid arthritis
pathway KEGG Rheumatoid arthritis

Similar

Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *