A209 VALIDATION OF A NATURAL LANGUAGE PROCESSING ALGORITHM TO EXTRACT DATA FOR SYSTEM-LEVEL ADENOMA DETECTION RATE CALCULATION
Abstract Background Patients of endoscopists with lower adenoma detection rates (ADR) are more likely to die from missed colorectal cancers. Measuring ADR is challenging at the system level as pathology results are generally reported in unstructured electronic medical records. Natural language proce...
Saved in:
| Published in | Journal of the Canadian Association of Gastroenterology Vol. 2; no. Supplement_2; p. 409 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
US
Oxford University Press
15.03.2019
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2515-2084 2515-2092 2515-2092 |
| DOI | 10.1093/jcag/gwz006.208 |
Cover
| Summary: | Abstract
Background
Patients of endoscopists with lower adenoma detection rates (ADR) are more likely to die from missed colorectal cancers. Measuring ADR is challenging at the system level as pathology results are generally reported in unstructured electronic medical records. Natural language processing (NLP) can be used to extract relevant information from text-based records.
Aims
At Cancer Care Ontario (CCO), we developed and validated a NLP algorithm to identify colorectal adenomas in unstructured electronic pathology reports available in CCO’s Electronic Mapping Reporting and Coding (eMaRC) data.
Methods
We identified pathology reports from colonoscopies in eMaRC as those with specimen type ‘biopsy’ and anatomic site ‘colon’. The sampling period was restricted to 2015–16 and patients older than 50 years. From this sampling frame, two random samples of 450 and 1,000 reports were selected as the test and validation sets. Expert clinicians reviewed and classified reports as adenoma or other. The test set was used to develop an NLP algorithm to identify adenomas using Base SAS 9.4. Statistical analyses, including sensitivity (recall), specificity, precision (positive predictive value) and F1 score of the NLP algorithm compared to clinician review were determined.
Results
A significant proportion of Ontario colonoscopists, patologists and laboratories were represented in the examined validation sets.
The sensitivity of the NLP algorithm was approximately 100% (95 %CI: 98.51–100) and 99.81% (95 %CI: 98.97–100) in the test and validation sets, respectively. Similarly, the specificity was 99.08% (95 %CI: 94.99–99.98) and 100% (95 %CI: 99.21–100).
Conclusions
The CCO NLP algorithm was highly accurate in identifying colorectal adenomas in eMaRC data across many institutions in Ontario,. This lays the groundwork to measure ADR at the system-level in Ontario
Funding Agencies
Cancer Care Ontario |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2515-2084 2515-2092 2515-2092 |
| DOI: | 10.1093/jcag/gwz006.208 |