Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition.

Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition.

Publication date: Jul 15, 2019

This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption.

Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model’s structure allows the precursor entity information to propagate forward through the label sequence.

We compared the proposed model with both first- and second-order CRFs in terms of their F-scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model.

The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models.

Open Access PDF

Lee, W. and Choi, J. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition. 05007. 2019 BMC Med Inform Decis Mak (19):1.

Concepts Keywords
BMC Natural processing
Computational Time Conditional random field
Conditional Machine learning
Conditional Random Fields Named-entity recognition
Corpora CRF
Named Entity Recognition Clinical data management
Seoul Clinical research
Transmission Medium Academic disciplines

Semantics

Type Source Name
gene UNIPROT IRF1
drug DRUGBANK Altretamine
gene UNIPROT CASP8
gene UNIPROT SPEF1
gene UNIPROT PDZK1
disease MESH multi
gene UNIPROT ADA2
gene UNIPROT HAAO
gene UNIPROT CBLIF
gene UNIPROT POC1A
gene UNIPROT ALPK3
gene UNIPROT MAK
gene UNIPROT RPS16
gene UNIPROT INTU
gene UNIPROT PROC
gene UNIPROT F11R
disease MESH postoperative complications
gene UNIPROT FAM168B
gene UNIPROT COL9A3
gene UNIPROT COMP
gene UNIPROT COL9A1
gene UNIPROT COL9A2
gene UNIPROT SCN8A
gene UNIPROT DUOXA1
gene UNIPROT NKRF
gene UNIPROT PDLIM5
disease MESH growth
gene UNIPROT ARTN
gene UNIPROT AGRP
gene UNIPROT ILVBL
drug DRUGBANK Gold
gene UNIPROT CHL1
gene UNIPROT ARID1A
gene UNIPROT ESPL1
gene UNIPROT SIRPA
drug DRUGBANK Flunarizine
gene UNIPROT FRZB
gene UNIPROT DUSP4
disease MESH typ
gene UNIPROT SON
gene UNIPROT LAT2
disease MESH rheumatism
gene UNIPROT RORC
drug DRUGBANK Aspartame
gene UNIPROT PTPRF
gene UNIPROT AVPR2
drug DRUGBANK Coenzyme M
disease MESH multiple
gene UNIPROT ATM
gene UNIPROT SLC33A1
gene UNIPROT AGTR1
gene UNIPROT CTSE
gene UNIPROT TRIM37
gene UNIPROT JTB
gene UNIPROT NR1I2
gene UNIPROT SYMPK
gene UNIPROT RASA1
gene UNIPROT RGS6
gene UNIPROT SSRP1
gene UNIPROT SERPINB6
gene UNIPROT SORBS1
gene UNIPROT BRD4
gene UNIPROT HACD1
gene UNIPROT LNPEP
gene UNIPROT CAP1
gene UNIPROT SET
gene UNIPROT DHDDS
gene UNIPROT CIT
drug DRUGBANK L-Citrulline
gene UNIPROT ANXA13
gene UNIPROT PSMB5
gene UNIPROT GJB2
disease MESH risk factors
gene UNIPROT LARGE1
gene UNIPROT PARN
gene UNIPROT NBL1
gene UNIPROT PTPN5
gene UNIPROT KCNK3
gene UNIPROT DEPP1
gene UNIPROT GOPC
gene UNIPROT NR1H2
disease MESH diagnosis
gene UNIPROT TBPL1
gene UNIPROT NINL
gene UNIPROT NES
drug DRUGBANK Tropicamide
disease MESH separated
gene UNIPROT CRH
gene UNIPROT C1QL1
drug DRUGBANK Corticorelin
disease DOID CRF

Similar

Original Article

Leave a Comment

Your email address will not be published. Required fields are marked *