Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition

Background This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textua...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 19; no. 1; pp. 132 - 13
Main Authors	Lee, Wangjin, Choi, Jinwook
Format	Journal Article
Language	English
Published	London BioMed Central 15.07.2019 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Active learning Artificial intelligence Automation Clinical named entity recognition Clinical natural language processing Computer applications Computing time Conditional random fields Dependence Electronic health records Electronic medical records Health Informatics Health Information Systems High-order dependency Humans Identification Induction method Informatics Information management Information processing Information Systems and Communication Service Labeling Labels Language Machine learning Management of Computing and Information Systems Markov chains Medical informatics Medical records Medicine Medicine & Public Health Methods modeling Models, Theoretical Natural Language Processing Neural networks Precursors Recognition Records management Technical Advance technology Technology application South Korea Clinical named entity recognition Induction method Conditional random fields High-order dependency Clinical natural language processing
Online Access	Get full text
ISSN	1472-6947 1472-6947
DOI	10.1186/s12911-019-0865-1

Cover

More Information
Summary:	Background This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption. Methods Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model’s structure allows the precursor entity information to propagate forward through the label sequence. Results We compared the proposed model with both first- and second-order CRFs in terms of their F 1 -scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model. Conclusion The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F 1 score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1472-6947 1472-6947
DOI:	10.1186/s12911-019-0865-1