Misleading Results in Posttraumatic Stress Disorder Predictive Models Using Electronic Health Record Data: Algorithm Validation Study
Electronic health record (EHR) data are increasingly used in predictive models of posttraumatic stress disorder (PTSD), but it is unknown how multivariable prediction of an EHR-based diagnosis might differ from prediction of a more rigorous diagnostic criterion. This distinction is important because...
        Saved in:
      
    
          | Published in | Journal of medical Internet research Vol. 27; p. e63352 | 
|---|---|
| Main Authors | , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Canada
          JMIR Publications
    
        27.08.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1438-8871 1439-4456 1438-8871  | 
| DOI | 10.2196/63352 | 
Cover
| Summary: | Electronic health record (EHR) data are increasingly used in predictive models of posttraumatic stress disorder (PTSD), but it is unknown how multivariable prediction of an EHR-based diagnosis might differ from prediction of a more rigorous diagnostic criterion. This distinction is important because EHR data are subject to multiple biases, including diagnostic misclassification and differential health care use resulting from factors such as illness severity.
This study aims to compare predictive models using the same predictors to predict an EHR-based versus semistructured interview-based PTSD diagnostic criterion, quantify model performance discrepancies, and examine potential mechanisms that account for performance differences.
We compared the performance of several machine learning models predicting EHR-based PTSD diagnosis to models predicting semistructured interview-based diagnosis in a nationwide sample of 1343 US veterans who completed Structured Clinical Interview for DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition) (SCID-5) interviews and had clinic visit data extracted from the Veterans Affairs (VA) EHR. We developed 2 sets of predictive models using 3 algorithms (elastic net regression, random forest, and XGBoost), with a nested cross-validation scheme consisting of an initial train-test split and 10-fold cross-validation within the training set for each type of model. All models used a nearly identical set of predictors including 29 EHR-based visit count variables and 8 demographic variables.
Diagnostic concordance between EHR-based PTSD diagnosis and SCID-5-based PTSD diagnosis was 73.3%, with 17.8% false negatives and 8.9% false positives for EHR-based diagnosis. Models predicting EHR-based PTSD performed very well (area under the receiver operating characteristic curve [AUC] .85-.9; Matthews correlation coefficient [MCC] .58-.69), whereas those predicting interview-based PTSD performed only moderately well overall (AUC .71-.76; MCC .24-.28). Sensitivity analyses showed that participants' frequency of VA visits played a role in these differences, such that the density of EHR data (proportion of nonzero visit counts across EHR variables) was more associated with EHR-based PTSD diagnosis (b=-0.18, SE 0.02, P<.001) than with SCID-5 interview-based PTSD diagnosis (b=-0.06, SE 0.01, P<.001).
Predictive models of PTSD built using only EHR data demonstrated inflated performance metrics relative to models predicting diagnosis from a rigorous structured clinical interview. This performance discrepancy appears driven by circular relationships between health care use patterns and EHR-based diagnosis that do not affect external diagnostic criteria. Researchers building clinical prediction models should not assume that diagnosis in the EHR is a sufficient proxy for the true criterion of interest. Clinicians and researchers should be cautious in interpreting clinical prediction models using only EHR data, as their real-world utility may be less than performance metrics suggest. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Undefined-3 these authors contributed equally  | 
| ISSN: | 1438-8871 1439-4456 1438-8871  | 
| DOI: | 10.2196/63352 |