Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients

Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. T...

Full description

Saved in:
Bibliographic Details
Published inThrombosis research Vol. 209; pp. 51 - 58
Main Authors Verma, Amol A., Masoom, Hassan, Pou-Prom, Chloe, Shin, Saeha, Guerzhoy, Michael, Fralick, Michael, Mamdani, Muhammad, Razak, Fahad
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.01.2022
Subjects
Online AccessGet full text
ISSN0049-3848
1879-2472
1879-2472
DOI10.1016/j.thromres.2021.11.020

Cover

More Information
Summary:Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published “simpleNLP” tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the “gold standard” manual review in a separate random sample of 4000 GIM hospitalizations. Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets) •ICD-10 codes do not reliably identify venous thromboembolism (VTE) in hospitalized adults.•We developed algorithms to accurately identify VTE from radiology reports.•This tool is freely available for researchers: https://lks-chart.github.io/CHARTextract-docs/
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0049-3848
1879-2472
1879-2472
DOI:10.1016/j.thromres.2021.11.020