A209 VALIDATION OF A NATURAL LANGUAGE PROCESSING ALGORITHM TO EXTRACT DATA FOR SYSTEM-LEVEL ADENOMA DETECTION RATE CALCULATION

Abstract Background Patients of endoscopists with lower adenoma detection rates (ADR) are more likely to die from missed colorectal cancers. Measuring ADR is challenging at the system level as pathology results are generally reported in unstructured electronic medical records. Natural language proce...

Full description

Saved in:
Bibliographic Details
Published inJournal of the Canadian Association of Gastroenterology Vol. 2; no. Supplement_2; p. 409
Main Authors Morgan, D, Chorneyko, K, Swain, D, Bowes, B, Lee, V, Tinmouth, J
Format Journal Article
LanguageEnglish
Published US Oxford University Press 15.03.2019
Subjects
Online AccessGet full text
ISSN2515-2084
2515-2092
2515-2092
DOI10.1093/jcag/gwz006.208

Cover

More Information
Summary:Abstract Background Patients of endoscopists with lower adenoma detection rates (ADR) are more likely to die from missed colorectal cancers. Measuring ADR is challenging at the system level as pathology results are generally reported in unstructured electronic medical records. Natural language processing (NLP) can be used to extract relevant information from text-based records. Aims At Cancer Care Ontario (CCO), we developed and validated a NLP algorithm to identify colorectal adenomas in unstructured electronic pathology reports available in CCO’s Electronic Mapping Reporting and Coding (eMaRC) data. Methods We identified pathology reports from colonoscopies in eMaRC as those with specimen type ‘biopsy’ and anatomic site ‘colon’. The sampling period was restricted to 2015–16 and patients older than 50 years. From this sampling frame, two random samples of 450 and 1,000 reports were selected as the test and validation sets. Expert clinicians reviewed and classified reports as adenoma or other. The test set was used to develop an NLP algorithm to identify adenomas using Base SAS 9.4. Statistical analyses, including sensitivity (recall), specificity, precision (positive predictive value) and F1 score of the NLP algorithm compared to clinician review were determined. Results A significant proportion of Ontario colonoscopists, patologists and laboratories were represented in the examined validation sets. The sensitivity of the NLP algorithm was approximately 100% (95 %CI: 98.51–100) and 99.81% (95 %CI: 98.97–100) in the test and validation sets, respectively. Similarly, the specificity was 99.08% (95 %CI: 94.99–99.98) and 100% (95 %CI: 99.21–100). Conclusions The CCO NLP algorithm was highly accurate in identifying colorectal adenomas in eMaRC data across many institutions in Ontario,. This lays the groundwork to measure ADR at the system-level in Ontario Funding Agencies Cancer Care Ontario
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2515-2084
2515-2092
2515-2092
DOI:10.1093/jcag/gwz006.208