The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Background Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation stud...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 24; no. 1; pp. 33 - 15
Main Authors	Haque, Md Ashiqul, Gedara, Muditha Lakmali Bodawatte, Nickel, Nathan, Turgeon, Maxime, Lix, Lisa M.
Format	Journal Article
Language	English
Published	London BioMed Central 02.02.2024 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Algorithms Analysis Chronic illnesses Data sources Electronic health records Electronic Health Records - standards Electronic medical records Electronic records Estimates Health aspects Health Informatics Heterogeneity Hospitals Humans Information management Information Systems and Communication Service Keywords Learning algorithms Machine learning Management of Computing and Information Systems Medical records Medical research Medicare Medicine Medicine & Public Health Medicine, Experimental Meta-analysis Population Quality assessment Quality control Regression models Reproducibility of Results Review Reviews Risk factors Routinely collected health data Sensitivity Smoking Smoking - epidemiology Systematic review Tobacco Validation study Validity Canada Algorithms Review Routinely collected health data Electronic health records Validation study
Online Access	Get full text
ISSN	1472-6947 1472-6947
DOI	10.1186/s12911-024-02416-3

Cover

More Information
Summary:	Background Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms. Methods The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Results The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity ( p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data ( p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment. Conclusions Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Feature-3 ObjectType-Evidence Based Healthcare-1 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-3
ISSN:	1472-6947 1472-6947
DOI:	10.1186/s12911-024-02416-3