Machine learning application for development of a data-driven predictive model able to investigate quality of life scores in a rare disease.

Machine learning application for development of a data-driven predictive model able to investigate quality of life scores in a rare disease.

Publication date: Feb 12, 2020

Alkaptonuria (AKU) is an ultra-rare autosomal recessive disease caused by a mutation in the homogentisate 1,2-dioxygenase (HGD) gene. One of the main obstacles in studying AKU, and other ultra-rare diseases, is the lack of a standardized methodology to assess disease severity or response to treatment. Quality of Life scores (QoL) are a reliable way to monitor patients’ clinical condition and health status. QoL scores allow to monitor the evolution of diseases and assess the suitability of treatments by taking into account patients’ symptoms, general health status and care satisfaction. However, more comprehensive tools to study a complex and multi-systemic disease like AKU are needed. In this study, a Machine Learning (ML) approach was implemented with the aim to perform a prediction of QoL scores based on clinical data deposited in the ApreciseKUre, an AKU- dedicated database.

Data derived from 129 AKU patients have been firstly examined through a preliminary statistical analysis (Pearson correlation coefficient) to measure the linear correlation between 11 QoL scores. The variable importance in QoL scores prediction of 110 ApreciseKUre biomarkers has been then calculated using XGBoost, with K-nearest neighbours algorithm (k-NN) approach. Due to the limited number of data available, this model has been validated using surrogate data analysis.

We identified a direct correlation of 6 (age, Serum Amyloid A, Chitotriosidase, Advanced Oxidation Protein Products, S-thiolated proteins and Body Mass Index) out of 110 biomarkers with the QoL health status, in particular with the KOOS (Knee injury and Osteoarthritis Outcome Score) symptoms (Relative Absolute Error (RAE) 0.25). The error distribution of surrogate-model (RAE 0.38) was unequivocally higher than the true-model one (RAE of 0.25), confirming the consistency of our dataset. Our data showed that inflammation, oxidative stress, amyloidosis and lifestyle of patients correlates with the QoL scores for physical status, while no correlation between the biomarkers and patients’ mental health was present (RAE 1.1).

This proof of principle study for rare diseases confirms the importance of database, allowing data management and analysis, which can be used to predict more effective treatments.

Open Access PDF

Spiga, O., , Cicaloni, Fiorini, C., Trezza, A., Visibelli, A., Millucci, L., Bernardini, G., Bernini, A., Marzocchi, B., Braconi, D., Prischi, F., and Santucci, A. Machine learning application for development of a data-driven predictive model able to investigate quality of life scores in a rare disease. 06326. 2020 Orphanet J Rare Dis (15):1.

Concepts Keywords
Algorithm Alkaptonuria
Alkaptonuria Biomarkers
Amyloid Clinical medicine
Amyloidosis Branches of biology
Autosomal Recessive Medicine
Biomarkers Evolution diseases
Body Mass Index Homogentisate 1,2-dioxygenase
Correlation Aku
Evolution Rare disease
Gene Quality of life
Linear Correlation
Oxidative Stress
Pearson Correlation Coefficient
Rare Disease


Type Source Name
disease MESH development
disease MESH rare disease
disease MESH Alkaptonuria
disease MESH satisfaction
disease MESH Osteoarthritis
disease MESH inflammation
disease MESH oxidative stress
disease MESH amyloidosis
disease MESH lifestyle
disease MESH Men
drug DRUGBANK L-Tyrosine
drug DRUGBANK Glutamine Hydroxamate
disease MESH Ochronosis
disease MESH stenosis
disease MESH arthropathy
disease MESH sarcoidosis
disease MESH arthritis
disease MESH ankylosing spondylitis
disease MESH uveitis
disease MESH fibrosis
disease MESH obstructive lung diseases
disease MESH visual
drug DRUGBANK Flunarizine
disease MESH anxiety
disease MESH depression
drug DRUGBANK Coenzyme M
disease MESH chronic disease
disease MESH joint pain
drug DRUGBANK Acetylsalicylic acid
disease MESH Aging
drug DRUGBANK Glutathione
disease MESH death
drug DRUGBANK Trestolone
disease MESH immuno
drug DRUGBANK Methotrexate
pathway KEGG Oxidative phosphorylation
disease MESH obesity
drug DRUGBANK Nitisinone
disease MESH Tumor
disease MESH inborn errors metabolism
disease MESH bone disease
pathway REACTOME Metabolism
disease MESH alkaptonuric ochronosis
drug DRUGBANK Ascorbic acid
disease MESH Diagnosis
disease MESH interstitial lung diseases
disease MESH COPD
disease MESH Allergy
disease MESH Asthma
pathway KEGG Asthma
disease MESH juvenile idiopathic arthritis
disease MESH rheumatoid arthritis
pathway KEGG Rheumatoid arthritis
drug DRUGBANK (S)-Des-Me-Ampa


Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *