Using natural language processing to identify opioid use disorder in electronic health record data

•NLP methods can identify OUD cases in unstructured EHR data.•Use of NLP can identify OUD cases that would be missed by ICD-10-CM codes alone.•NLP should be considered for epidemiological studies involving EHR data.•NLP methods can be implemented using open source tools such as Python. As opioid pre...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of medical informatics (Shannon, Ireland) Vol. 170; p. 104963
Main Authors Singleton, Jade, Li, Chengxi, Akpunonu, Peter D., Abner, Erin L., Kucharska-Newton, Anna M.
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.02.2023
Subjects
Online AccessGet full text
ISSN1386-5056
1872-8243
1872-8243
DOI10.1016/j.ijmedinf.2022.104963

Cover

More Information
Summary:•NLP methods can identify OUD cases in unstructured EHR data.•Use of NLP can identify OUD cases that would be missed by ICD-10-CM codes alone.•NLP should be considered for epidemiological studies involving EHR data.•NLP methods can be implemented using open source tools such as Python. As opioid prescriptions have risen, there has also been an increase in opioid use disorder (OUD) and its adverse outcomes. Accurate and complete epidemiologic surveillance of OUD, to inform prevention strategies, presents challenges. The objective of this study was to ascertain prevalence of OUD using two methods to identify OUD in electronic health records (EHR): applying natural language processing (NLP) for text mining of unstructured clinical notes and using ICD-10-CM diagnostic codes. Data were drawn from EHR records for hospital and emergency department patient visits to a large regional academic medical center from 2017 to 2019. International Classification of Disease, 10th Edition, Clinic Modification (ICD-10-CM) discharge codes were extracted for each visit. To develop the rule-based NLP algorithm, a stepwise process was used. First, a small sample of visits from 2017 was used to develop initial dictionaries. Next, EHR corresponding to 30,124 visits from 2018 were used to develop and evaluate the rule-based algorithm. A random sample of the results were manually reviewed to identify and address shortcomings in the algorithm, and to estimate sensitivity and specificity of the two methods of ascertainment. Last, the final algorithm was then applied to 29,212 visits from 2019 to estimate OUD prevalence. While there was substantial overlap in the identified records (n = 1,381 [59.2 %]), overall n = 2,332 unique visits were identified. Of the total unique visits, 430 (18.4 %) were identified only by ICD-10-CM codes, and 521 (22.3 %) were identified only by NLP. The prevalence of visits with evidence of an OUD diagnosis in this sample, ascertained using only ICD-10-CM codes, was 1,811/29,212 (6.1 %). Including the additional 521 visits identified only by NLP, the estimated prevalence of OUD is 2,332/29,212 (7.9 %), an increase of 29.5 % compared to the use of ICD-10-CM codes alone. The estimated sensitivity and specificity of the NLP-based OUD classification were 81.8 % and 97.5 %, respectively, relative to gold-standard manual review by an expert addiction medicine physician. NLP-based algorithms can automate data extraction and identify evidence of opioid use disorder from unstructured electronic healthcare records. The most complete ascertainment of OUD in EHR was combined NLP with ICD-10-CM codes. NLP should be considered for epidemiological studies involving EHR data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1386-5056
1872-8243
1872-8243
DOI:10.1016/j.ijmedinf.2022.104963