Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review

Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolki...

Full description

Saved in:
Bibliographic Details
Published inArtificial intelligence in medicine Vol. 146; p. 102701
Main Authors Sim, Jin-ah, Huang, Xiaolei, Horan, Madeline R., Stewart, Christopher M., Robison, Leslie L., Hudson, Melissa M., Baker, Justin N., Huang, I-Chan
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.12.2023
Subjects
Online AccessGet full text
ISSN0933-3657
1873-2860
1873-2860
DOI10.1016/j.artmed.2023.102701

Cover

More Information
Summary:Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care. We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs. Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP. This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted. •This systematic review summarizes the NLP/ML applications for analyzing PROs in clinical narratives of EHRs.•Most studies used NLP/ML techniques to extract unstructured PROs or predict disease progression.•Studies used different linguistic (i.e., lexical, syntactic, semantic, contextual) features to process the unstructured PROs.•Using annotation rules to analyze unstructured PROs is dominant, yet deploying novel neural ML-based methods is warranted.•Applying NLP/ML techniques to analyze unstructured PROs in EHRs can facilitate the clinical decision-making process.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-3
content type line 23
ObjectType-Undefined-2
Conceptualization: Jin-ah Sim, I-Chan Huang; Data curation: Jin-ah Sim; Funding acquisition: Melissa M. Hudson, Justin N. Baker, I-Chan Huang; Methodology: Jin-ah Sim, Xiaolei Huang, Christopher M. Stewart, I-Chan Huang; Project administration: I-Chan Huang; Resources: I-Chan Huang; Supervision: I-Chan Huang; Visualization: Jin-ah Sim; Writing - original draft preparation: Jin-ah Sim, I-Chan Huang; Writing - review & editing: Xiaolei Huang, Madeline R. Horan, Christopher M. Stewart, Leslie L. Robison, Melissa M. Hudson, Justin N. Baker, I-Chan Huang; All authors have read and agreed to the submitted version of the manuscript.
AUTHOR CONTRIBUTORSHIP
ISSN:0933-3657
1873-2860
1873-2860
DOI:10.1016/j.artmed.2023.102701