An unsupervised learning approach to identify novel signatures of health and disease from multimodal data.

Publication date: Jan 10, 2020

Modern medicine is rapidly moving towards a data-driven paradigm based on comprehensive multimodal health assessments. Integrated analysis of data from different modalities has the potential of uncovering novel biomarkers and disease signatures.

We collected 1385 data features from diverse modalities, including metabolome, microbiome, genetics, and advanced imaging, from 1253 individuals and from a longitudinal validation cohort of 1083 individuals. We utilized a combination of unsupervised machine learning methods to identify multimodal biomarker signatures of health and disease risk.

Our method identified a set of cardiometabolic biomarkers that goes beyond standard clinical biomarkers. Stratification of individuals based on the signatures of these biomarkers identified distinct subsets of individuals with similar health statuses. Subset membership was a better predictor for diabetes than established clinical biomarkers such as glucose, insulin resistance, and body mass index. The novel biomarkers in the diabetes signature included 1-stearoyl-2-dihomo-linolenoyl-GPC and 1-(1-enyl-palmitoyl)-2-oleoyl-GPC. Another metabolite, cinnamoylglycine, was identified as a potential biomarker for both gut microbiome health and lean mass percentage. We identified potential early signatures for hypertension and a poor metabolic health outcome. Additionally, we found novel associations between a uremic toxin, p-cresol sulfate, and the abundance of the microbiome genera Intestinimonas and an unclassified genus in the Erysipelotrichaceae family.

Our methodology and results demonstrate the potential of multimodal data integration, from the identification of novel biomarker signatures to a data-driven stratification of individuals into disease subtypes and stages-an essential step towards personalized, preventative health risk assessment.

Open Access PDF

, Shomorony, Cirulli, E.T., Huang, L., Napier, L.A., Heister, R.R., Hicks, M., , Cohen, Yu, H.C., Swisher, C.L., Schenker-Ahmed, N.M., Li, W., Nelson, K.E., Brar, P., Kahn, A.M., Spector, T.D., Caskey, C.T., Venter, J.C., Karow, D.S., Kirkness, E.F., and Shah, N. An unsupervised learning approach to identify novel signatures of health and disease from multimodal data. 06141. 2020 Genome Med (12):1.

Concepts Keywords
Biomarker Imaging
Biomarkers Biomarkers
Body Mass Branches of biology
Cohort Life sciences
Cresol Metabolomics
Insulin Resistance
Mass Percentage
Risk Assessment
Unsupervised Learning
Uremic Toxin


Type Source Name
drug DRUGBANK Tropicamide
drug DRUGBANK Dextrose unspecified form
disease MESH insulin resistance
pathway KEGG Insulin resistance
drug DRUGBANK Hexadecanal
disease MESH hypertension
drug DRUGBANK Cresol
drug DRUGBANK Sulfate ion
disease MESH syndrome
disease MESH communities
drug DRUGBANK Sulpiride
disease MESH glucose intolerance
disease MESH Multi
disease MESH Atherosclerosis
drug DRUGBANK Esomeprazole
disease MESH multiple
drug DRUGBANK Coenzyme M
drug DRUGBANK Acetylsalicylic acid
disease MESH diagnosis
drug DRUGBANK L-Valine
disease MESH metabolic syndrome
drug DRUGBANK Butyric Acid
disease MESH obesity
drug DRUGBANK Iron
drug DRUGBANK Tocopherol
drug DRUGBANK Methionine
drug DRUGBANK Glycerin
drug DRUGBANK Glutamic Acid
disease MESH strokes
disease MESH transient ischemic attack
drug DRUGBANK Methylergometrine
drug DRUGBANK Cholesterol
drug DRUGBANK L-Lysine
disease MESH comorbidity
disease MESH uncertainty
drug DRUGBANK Calcium
disease MESH Chronic kidney disease
drug DRUGBANK Calusterone
disease MESH growth
disease MESH prediabetes
disease MESH fibrosis
disease MESH nonalcoholic fatty liver disease
drug DRUGBANK Serine
drug DRUGBANK Tioguanine
disease MESH stable angina
disease MESH chronic renal failure
disease MESH aberrant crypt foci
disease MESH weight loss


Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *