Combining Machine Learning with a Rule-Based Algorithm to Detect and Identify Related Entities of Documented Adverse Drug Reactions on Hospital Discharge Summaries

Introduction Discharge summaries contain valuable information about adverse drug reactions, but their unstructured nature makes them challenging to analyse and use as a signal source for pharmacovigilance. Machine learning has shown promise in identifying discharge summaries that contain related dru...

Full description

Saved in:
Bibliographic Details
Published inDrug safety Vol. 45; no. 8; pp. 853 - 862
Main Authors Tan, Hui Xing, Teo, Chun Hwee Desmond, Ang, Pei San, Loke, Wei Ping Celine, Tham, Mun Yee, Tan, Siew Har, Soh, Bee Leng Sally, Foo, Pei Qin Belinda, Ling, Zheng Jye, Yip, Wei Luen James, Tang, Yixuan, Yang, Jisong, Tung, Kum Hoe Anthony, Dorajoo, Sreemanee Raaj
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.08.2022
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0114-5916
1179-1942
DOI10.1007/s40264-022-01196-x

Cover

More Information
Summary:Introduction Discharge summaries contain valuable information about adverse drug reactions, but their unstructured nature makes them challenging to analyse and use as a signal source for pharmacovigilance. Machine learning has shown promise in identifying discharge summaries that contain related drug-adverse event pairs but has fared relatively poorer in entity extraction. Methods A hybrid model is developed combining rule-based and machine learning algorithms using discharge summaries with the aim of maximising capture of related drug-adverse event pairs. The rule first identifies segments containing adverse event entities within a 100-character distance from a drug term; machine learning subsequently estimates the relatedness of the drug and adverse event entities contained. The approach is validated on four independent datasets that are temporally and geographically separated from model development data. The impact of restricted drug-adverse event pair detection on recall is evaluated by using two of the four validation datasets that do not impose rule-based restrictions to annotations. Results The hybrid model achieves a recall of 0.80 (fivefold cross validation), 0.80 (temporal) and 0.76 (geographical) on validation using datasets containing only pre-identified target text segments that fulfil the rule-based algorithm criteria. When tested on datasets that additionally contained drug-adverse event pairs not restricted by the rule-based criteria, recall of the model declines to 0.68 and 0.62 on temporally and geographically separated datasets, respectively. Conclusions The proposed hybrid model demonstrates reasonable generalisability on external validation. Rule-based restriction of the detection space results in an approximately 12–14% reduction in recall but improves identification of the related drug and adverse event terms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0114-5916
1179-1942
DOI:10.1007/s40264-022-01196-x