An Empirical Test of GRUs and Deep Contextualized Word Representations on De-Identification.

An Empirical Test of GRUs and Deep Contextualized Word Representations on De-Identification.

Publication date: Aug 21, 2019

De-identification aims to remove 18 categories of protected health information from electronic health records. Ideally, de-identification systems should be reliable and generalizable. Previous research has focused on improving performance but has not examined generalizability. This paper investigates both performance and generalizability. To improve current state-of-the-art performance based on long short-term memory (LSTM) units, we introduce a system that uses gated recurrent units (GRUs) and deep contextualized word representations, both of which have never been applied to de-identification. We measure performance and generalizability of each system using the 2014 i2b2/UTHealth and 2016 CEGS N-GRID de-identification datasets. We show that deep contextualized word representations improve state-of-the-art performance, while the benefit of switching LSTM units with GRUs is not significant. The generalizability of de-identification system significantly improved with deep contextualized word representations; in addition, LSTM units-based system is more generalizable than the GRUs-based system.

Open Access PDF

Concepts Keywords
Short Term Memory Grus
Test Gated recurrent unit
Technology
Long short-term memory
Academic disciplines
Articles
Artificial neural networks
Identification systems
Generalizability theory
Artificial intelligence

Semantics

Type Source Name
gene UNIPROT SET
gene UNIPROT DES
drug DRUGBANK Diethylstilbestrol
gene UNIPROT NINL
gene UNIPROT ZGPAT
drug DRUGBANK L-Arginine
disease MESH 2*P*R
gene UNIPROT KCNK3
drug DRUGBANK Flunarizine
drug DRUGBANK Spinosad
gene UNIPROT CRH
gene UNIPROT C1QL1
drug DRUGBANK Corticorelin
disease DOID CRF
gene UNIPROT GPI
disease MESH privacy
gene UNIPROT GRAP2
gene UNIPROT AGRP

Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *