Misleading Results in Posttraumatic Stress Disorder Predictive Models Using Electronic Health Record Data: Algorithm Validation Study

Electronic health record (EHR) data are increasingly used in predictive models of posttraumatic stress disorder (PTSD), but it is unknown how multivariable prediction of an EHR-based diagnosis might differ from prediction of a more rigorous diagnostic criterion. This distinction is important because...

Full description

Saved in:

Bibliographic Details
Published in	Journal of medical Internet research Vol. 27; p. e63352
Main Authors	Crow, Thomas M, Lin, Eric, Harper, Kelly L, Crowe, Michael L, Keane, Terence M, Marx, Brian P
Format	Journal Article
Language	English
Published	Canada JMIR Publications 27.08.2025
Subjects	Adult Algorithms Artificial Intelligence Clinical Informatics Clinical Information and Decision Making Electronic Health Records Electronic/Mobile Data Capture, Internet-based Survey & Research Methodology Female Humans Machine Learning Male Middle Aged Original Paper Posttraumatic Stress Disorder (PTSD) Public (e)Health, Digital Epidemiology and Public Health Informatics Stress Disorders, Post-Traumatic - diagnosis Tools, Programs and Algorithms United States Veterans United States veterans clinical informatics clinics mental health electronic health records machine learning misleading information posttraumatic stress disorder PTSD clinical prediction models semistructured interview stress disorder sensitivity analyses clinic misleading result posttraumatic
Online Access	Get full text
ISSN	1438-8871 1439-4456 1438-8871
DOI	10.2196/63352

Cover

More Information
Summary:	Electronic health record (EHR) data are increasingly used in predictive models of posttraumatic stress disorder (PTSD), but it is unknown how multivariable prediction of an EHR-based diagnosis might differ from prediction of a more rigorous diagnostic criterion. This distinction is important because EHR data are subject to multiple biases, including diagnostic misclassification and differential health care use resulting from factors such as illness severity. This study aims to compare predictive models using the same predictors to predict an EHR-based versus semistructured interview-based PTSD diagnostic criterion, quantify model performance discrepancies, and examine potential mechanisms that account for performance differences. We compared the performance of several machine learning models predicting EHR-based PTSD diagnosis to models predicting semistructured interview-based diagnosis in a nationwide sample of 1343 US veterans who completed Structured Clinical Interview for DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition) (SCID-5) interviews and had clinic visit data extracted from the Veterans Affairs (VA) EHR. We developed 2 sets of predictive models using 3 algorithms (elastic net regression, random forest, and XGBoost), with a nested cross-validation scheme consisting of an initial train-test split and 10-fold cross-validation within the training set for each type of model. All models used a nearly identical set of predictors including 29 EHR-based visit count variables and 8 demographic variables. Diagnostic concordance between EHR-based PTSD diagnosis and SCID-5-based PTSD diagnosis was 73.3%, with 17.8% false negatives and 8.9% false positives for EHR-based diagnosis. Models predicting EHR-based PTSD performed very well (area under the receiver operating characteristic curve [AUC] .85-.9; Matthews correlation coefficient [MCC] .58-.69), whereas those predicting interview-based PTSD performed only moderately well overall (AUC .71-.76; MCC .24-.28). Sensitivity analyses showed that participants' frequency of VA visits played a role in these differences, such that the density of EHR data (proportion of nonzero visit counts across EHR variables) was more associated with EHR-based PTSD diagnosis (b=-0.18, SE 0.02, P<.001) than with SCID-5 interview-based PTSD diagnosis (b=-0.06, SE 0.01, P<.001). Predictive models of PTSD built using only EHR data demonstrated inflated performance metrics relative to models predicting diagnosis from a rigorous structured clinical interview. This performance discrepancy appears driven by circular relationships between health care use patterns and EHR-based diagnosis that do not affect external diagnostic criteria. Researchers building clinical prediction models should not assume that diagnosis in the EHR is a sufficient proxy for the true criterion of interest. Clinicians and researchers should be cautious in interpreting clinical prediction models using only EHR data, as their real-world utility may be less than performance metrics suggest.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Undefined-3 these authors contributed equally
ISSN:	1438-8871 1439-4456 1438-8871
DOI:	10.2196/63352