AssistMED project: Transforming cardiology cohort characterisation from electronic health records through natural language processing – Algorithm design, preliminary results, and field prospects

•A well-performing, natural language processing algorithm (NLP) for automatic, vast and detailed cardiology and internal medicine cohort characterisation from polish electronic health records (EHR) is presented.•Through an extensive algorithm validation, limitations of NLP-based data acquisition fro...

Full description

Saved in:

Bibliographic Details
Published in	International journal of medical informatics (Shannon, Ireland) Vol. 185; p. 105380
Main Authors	Maciejewski, Cezary, Ozierański, Krzysztof, Barwiołek, Adam, Basza, Mikołaj, Bożym, Aleksandra, Ciurla, Michalina, Janusz Krajsman, Maciej, Maciejewska, Magdalena, Lodziński, Piotr, Opolski, Grzegorz, Grabowski, Marcin, Cacko, Andrzej, Balsam, Paweł
Format	Journal Article
Language	English
Published	Ireland Elsevier B.V 01.05.2024
Subjects	Cardiology Epidemiology Natural language processing NLP Text-mining Natural language processing NLP Text-mining Cardiology Epidemiology
Online Access	Get full text
ISSN	1386-5056 1872-8243 1872-8243
DOI	10.1016/j.ijmedinf.2024.105380

Cover

More Information
Summary:	•A well-performing, natural language processing algorithm (NLP) for automatic, vast and detailed cardiology and internal medicine cohort characterisation from polish electronic health records (EHR) is presented.•Through an extensive algorithm validation, limitations of NLP-based data acquisition from EHR for research purposes are discussed.•Potential solutions to obstacles and future perspectives on NLP usage for clinical research facilitation are drawn. Electronic health records (EHR) are of great value for clinical research. However, EHR consists primarily of unstructured text which must be analysed by a human and coded into a database before data analysis- a time-consuming and costly process limiting research efficiency. Natural language processing (NLP) can facilitate data retrieval from unstructured text. During AssistMED project, we developed a practical, NLP tool that automatically provides comprehensive clinical characteristics of patients from EHR, that is tailored to clinical researchers needs. AssistMED retrieves patient characteristics regarding clinical conditions, medications with dosage, and echocardiographic parameters with clinically oriented data structure and provides researcher-friendly database output. We validate the algorithm performance against manual data retrieval and provide critical quantitative and qualitative analysis. AssistMED analysed the presence of 56 clinical conditions, medications from 16 drug groups with dosage and 15 numeric echocardiographic parameters in a sample of 400 patients hospitalized in the cardiology unit. No statistically significant differences between algorithm and human retrieval were noted. Qualitative analysis revealed that disagreements with manual annotation were primarily accounted to random algorithm errors, erroneous human annotation and lack of advanced context awareness of our tool. Current NLP approaches are feasible to acquire accurate and detailed patient characteristics tailored to clinical researchers' needs from EHR. We present an in-depth description of an algorithm development and validation process, discuss obstacles and pinpoint potential solutions, including opportunities arising with recent advancements in the field of NLP, such as large language models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1386-5056 1872-8243 1872-8243
DOI:	10.1016/j.ijmedinf.2024.105380