Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset

To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precis...

Full description

Saved in:

Bibliographic Details
Published in	Nature communications Vol. 12; no. 1; pp. 7304 - 9
Main Authors	Kehl, Kenneth L., Xu, Wenxin, Gusev, Alexander, Bakouny, Ziad, Choueiri, Toni K., Riaz, Irbaz Bin, Elmarakeby, Haitham, Van Allen, Eliezer M., Schrag, Deborah
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 15.12.2021 Nature Publishing Group Nature Portfolio
Subjects	45 692/4017 692/4028/67 Annotations Artificial Intelligence Biomarkers Cancer Cancer research Correlation Databases, Genetic Datasets Electronic health records Electronic medical records Genomics Humanities and Social Sciences Humans Immune checkpoint Language Medical research Metastases Molecular Sequence Annotation multidisciplinary Mutation Natural Language Processing Neoplasms - genetics Oncology Patients Precision medicine Science Science (multidisciplinary) Solid tumors Survival Tumors
Online Access	Get full text
ISSN	2041-1723 2041-1723
DOI	10.1038/s41467-021-27358-6

Cover

More Information
Summary:	To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precision oncology study. Outcomes are extracted from 305,151 imaging reports for 13,130 patients and 233,517 oncologist notes for 13,511 patients, including patients with 6 additional cancer types. NLP models recapitulate outcome annotation from these documents, including the presence of cancer, progression/worsening, response/improvement, and metastases, with excellent discrimination (AUROC > 0.90). Models generalize to cancers excluded from training and yield outcomes correlated with survival. Among patients receiving checkpoint inhibitors, we confirm that high tumor mutation burden is associated with superior progression-free survival ascertained using NLP. Here, we show that deep NLP can accelerate annotation of molecular cancer datasets with clinically meaningful endpoints to facilitate discovery. To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, the authors train natural language processing to extract outcomes for participants in a precision oncology study.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-021-27358-6