Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients

Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. T...

Full description

Saved in:

Bibliographic Details
Published in	Thrombosis research Vol. 209; pp. 51 - 58
Main Authors	Verma, Amol A., Masoom, Hassan, Pou-Prom, Chloe, Shin, Saeha, Guerzhoy, Michael, Fralick, Michael, Mamdani, Muhammad, Razak, Fahad
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.01.2022
Subjects	Algorithms Cross-Sectional Studies Deep vein thrombosis Hematology, Oncology, and Palliative Medicine Hospitalization Humans ICD codes International Classification of Diseases Natural Language Processing Ontario Pulmonary embolism Pulmonary Embolism - diagnostic imaging Radiology Validity Venous Thromboembolism - diagnostic imaging Ontario Pulmonary embolism Deep vein thrombosis Validity Natural language processing ICD codes
Online Access	Get full text
ISSN	0049-3848 1879-2472 1879-2472
DOI	10.1016/j.thromres.2021.11.020

Cover

More Information
Summary:	Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published “simpleNLP” tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the “gold standard” manual review in a separate random sample of 4000 GIM hospitalizations. Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets) •ICD-10 codes do not reliably identify venous thromboembolism (VTE) in hospitalized adults.•We developed algorithms to accurately identify VTE from radiology reports.•This tool is freely available for researchers: https://lks-chart.github.io/CHARTextract-docs/
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0049-3848 1879-2472 1879-2472
DOI:	10.1016/j.thromres.2021.11.020