Proteomic risk scores for predicting common diseases using linear and neural network models in the UK biobank.

Publication date: Jul 01, 2025

Plasma proteomics provides a unique opportunity to enhance disease prediction by capturing protein expression patterns linked to diverse pathological processes. Leveraging data from 2,923 proteins measured in 53,030 UK Biobank participants, we developed proteomic risk scores for 27 common outcomes over 5- and 15-year follow-up periods using two approaches: a linear ElasticNet regression model and a deep learning neural network (NN) model. Using Cox regression, we assessed the discrimination of proteomic risk scores either in isolation or as incremental improvements over clinical risk factors. We also studied the shared and unique protein predictors across conditions. Proteomic risk scores demonstrated strong discrimination for most outcomes, with a C-index > 0. 80 for 12 diseases. NN models outperformed linear models for 11 outcomes, particularly for diseases such as Parkinson’s disease (C-index 0. 84) and pulmonary embolism (C-index 0. 83), where nonlinear relationships contributed significantly to prediction. Across all outcomes, the addition of proteomic scores to clinical models improved predictive accuracy (ΔC-index 0. 03), with the greatest gains observed in 9 diseases (ΔC-index > 0. 1), including end-stage renal disease, pulmonary embolism, and Parkinson’s disease. Analysis of protein contributions revealed shared predictors across multiple diseases, such as growth differentiation factor 15 (GDF15), as well as unique predictors like PAEP for endometriosis. While NN models may capture complex relationships, linear models provided value through simplicity and interpretability. These findings underscore the importance of tailoring predictive approaches to specific diseases and demonstrate the pivotal potential of proteomics in advancing risk stratification and early detection.

Open Access PDF

Concepts Keywords
Parkinson Aged
Pathological Biological Specimen Banks
Proteomics Biomarkers
Pulmonary Biomarkers
Female
Humans
Linear Models
Male
Middle Aged
Neural Networks, Computer
Parkinson Disease
Proteomics
Risk Assessment
Risk Factors
UK Biobank
United Kingdom

Semantics

Type Source Name
disease MESH pathological processes
disease MESH Parkinson’s disease
disease MESH pulmonary embolism
disease MESH end-stage renal disease
disease MESH endometriosis
disease MESH Dementia
drug DRUGBANK Huperzine B
drug DRUGBANK Coenzyme M
disease MESH chronic diseases
disease MESH schizophrenia
disease MESH prostate cancer
pathway KEGG Prostate cancer
disease MESH type 2 diabetes
disease MESH depression
disease MESH COPD
disease MESH primary pulmonary hypertension
disease MESH cancer
disease MESH rheumatoid arthritis
pathway KEGG Rheumatoid arthritis
disease MESH asthma
pathway KEGG Asthma
disease MESH heart failure
disease MESH motor neuron disease
drug DRUGBANK Aspartame
disease MESH death
drug DRUGBANK Cholesterol
pathway KEGG Parkinson disease

Original Article

(Visited 11 times, 1 visits today)

Leave a Comment

Your email address will not be published. Required fields are marked *