A clustering approach for detecting implausible observation values in electronic health records data.

A clustering approach for detecting implausible observation values in electronic health records data.

Publication date: Jul 23, 2019

Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.

The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance.

We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases.

Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data.

Open Access PDF

Estiri, H., Klann, J.G., and Murphy, S.N. A clustering approach for detecting implausible observation values in electronic health records data. 05058. 2019 BMC Med Inform Decis Mak (19):1.

Concepts Keywords
Algorithm Outlier
Anomaly Detection Information technology management
BMC Data analysis
CAD Cluster analysis
Clustering Data security
Complement Anomaly detection
EHR Machine learning
False Positive Data mining
Mahalanobis Distance Information science
Outlier Detection CAD
Partners HealthCare
Silver Standard
Standard Deviation
Star Schema
Vital Sign
Workflow

Semantics

Type Source Name
gene UNIPROT JUN
gene UNIPROT ALX3
gene UNIPROT NT5E
drug DRUGBANK Delorazepam
gene UNIPROT CBLIF
drug DRUGBANK Coenzyme M
gene UNIPROT ADA2
gene UNIPROT CNOT8
drug DRUGBANK Potassium cation
drug DRUGBANK Serine
gene UNIPROT GAL
gene UNIPROT GTF2IRD1
gene UNIPROT APPL1
gene UNIPROT INTU
gene UNIPROT PROC
gene UNIPROT RAB35
gene UNIPROT SH3YL1
gene UNIPROT ENG
gene UNIPROT SELENOP
gene UNIPROT MMEL1
gene UNIPROT PLXNB1
gene UNIPROT WASHC1
gene UNIPROT COL9A3
gene UNIPROT COMP
gene UNIPROT COL9A1
gene UNIPROT COL9A2
gene UNIPROT SCN8A
gene UNIPROT PFDN1
gene UNIPROT PDF
gene UNIPROT GDF15
gene UNIPROT PMCH
drug DRUGBANK Methacholine
drug DRUGBANK D-glucose
drug DRUGBANK Dextrose unspecified form
drug DRUGBANK Alkaline Phosphatase
gene UNIPROT FBN1
drug DRUGBANK L-Alanine
gene UNIPROT EBP
drug DRUGBANK Isoxaflutole
gene UNIPROT DNMT1
gene UNIPROT CD69
gene UNIPROT CD5L
disease MESH comorbidities
gene UNIPROT SLC35G1
gene UNIPROT TNF
disease MESH dif
gene UNIPROT REST
gene UNIPROT NR4A3
gene UNIPROT PSMB6
gene UNIPROT YY1
gene UNIPROT DLL1
gene UNIPROT MAL
gene UNIPROT MRTFA
gene UNIPROT TIRAP
gene UNIPROT PICK1
gene UNIPROT DDX3X
gene UNIPROT MENT
drug DRUGBANK Trestolone
gene UNIPROT SOAT1
gene UNIPROT CDR1
gene UNIPROT AKR1C4
gene UNIPROT RUNX1T1
gene UNIPROT GEN1
disease MESH anomalies
gene UNIPROT LAT2
gene UNIPROT HSD11B1
drug DRUGBANK Cholesterol
gene UNIPROT DEPP1
gene UNIPROT GOPC
gene UNIPROT FRZB
gene UNIPROT TECR
gene UNIPROT SOX10
gene UNIPROT RAN
drug DRUGBANK Ranitidine
gene UNIPROT THOP1
disease DOID hypoglycemia
disease MESH hypoglycemia
disease MESH multiple
drug DRUGBANK Gold
gene UNIPROT SSRP1
gene UNIPROT ELK3
gene UNIPROT EPHB1
gene UNIPROT SLC6A2
gene UNIPROT GUCY2C
gene UNIPROT STAR
gene UNIPROT CAD
gene UNIPROT DFFB
gene UNIPROT ACOD1
drug DRUGBANK Silver
gene UNIPROT SET
disease MESH confusion
gene UNIPROT LARGE1

Similar

Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *