Combining Machine Learning with a Rule-Based Algorithm to Detect and Identify Related Entities of Documented Adverse Drug Reactions on Hospital Discharge Summaries
Introduction Discharge summaries contain valuable information about adverse drug reactions, but their unstructured nature makes them challenging to analyse and use as a signal source for pharmacovigilance. Machine learning has shown promise in identifying discharge summaries that contain related dru...
Saved in:
| Published in | Drug safety Vol. 45; no. 8; pp. 853 - 862 |
|---|---|
| Main Authors | , , , , , , , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Cham
Springer International Publishing
01.08.2022
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0114-5916 1179-1942 |
| DOI | 10.1007/s40264-022-01196-x |
Cover
| Summary: | Introduction
Discharge summaries contain valuable information about adverse drug reactions, but their unstructured nature makes them challenging to analyse and use as a signal source for pharmacovigilance. Machine learning has shown promise in identifying discharge summaries that contain related drug-adverse event pairs but has fared relatively poorer in entity extraction.
Methods
A hybrid model is developed combining rule-based and machine learning algorithms using discharge summaries with the aim of maximising capture of related drug-adverse event pairs. The rule first identifies segments containing adverse event entities within a 100-character distance from a drug term; machine learning subsequently estimates the relatedness of the drug and adverse event entities contained. The approach is validated on four independent datasets that are temporally and geographically separated from model development data. The impact of restricted drug-adverse event pair detection on recall is evaluated by using two of the four validation datasets that do not impose rule-based restrictions to annotations.
Results
The hybrid model achieves a recall of 0.80 (fivefold cross validation), 0.80 (temporal) and 0.76 (geographical) on validation using datasets containing only pre-identified target text segments that fulfil the rule-based algorithm criteria. When tested on datasets that additionally contained drug-adverse event pairs not restricted by the rule-based criteria, recall of the model declines to 0.68 and 0.62 on temporally and geographically separated datasets, respectively.
Conclusions
The proposed hybrid model demonstrates reasonable generalisability on external validation. Rule-based restriction of the detection space results in an approximately 12–14% reduction in recall but improves identification of the related drug and adverse event terms. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0114-5916 1179-1942 |
| DOI: | 10.1007/s40264-022-01196-x |