The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Background Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation stud...

Full description

Saved in:
Bibliographic Details
Published inBMC medical informatics and decision making Vol. 24; no. 1; pp. 33 - 15
Main Authors Haque, Md Ashiqul, Gedara, Muditha Lakmali Bodawatte, Nickel, Nathan, Turgeon, Maxime, Lix, Lisa M.
Format Journal Article
LanguageEnglish
Published London BioMed Central 02.02.2024
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects
Online AccessGet full text
ISSN1472-6947
1472-6947
DOI10.1186/s12911-024-02416-3

Cover

More Information
Summary:Background Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms. Methods The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Results The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity ( p  = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data ( p  = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment. Conclusions Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-3
ObjectType-Evidence Based Healthcare-1
ObjectType-Feature-1
content type line 23
ObjectType-Undefined-3
ISSN:1472-6947
1472-6947
DOI:10.1186/s12911-024-02416-3