Categorization of continuous covariates and complex regression models—friends or foes in intersectionality research

To reduce health inequities, it is important to identify intersections in characteristics of individuals subject to privilege or disadvantage. Different proposals for that have recently been published. One approach (1) considers models specified with first- and all second-order effects and another (...

Full description

Saved in:
Bibliographic Details
Published inJournal of clinical epidemiology Vol. 171; p. 111368
Main Authors Richter, Adrian, Ulbricht, Sabina, Brockhaus, Sarah
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.07.2024
Elsevier Limited
Subjects
Online AccessGet full text
ISSN0895-4356
1878-5921
1878-5921
DOI10.1016/j.jclinepi.2024.111368

Cover

More Information
Summary:To reduce health inequities, it is important to identify intersections in characteristics of individuals subject to privilege or disadvantage. Different proposals for that have recently been published. One approach (1) considers models specified with first- and all second-order effects and another (2) the stratification based on multiple covariates; both categorize continuous covariates. A simulation study was conducted in order to review both methods with regard to identification of intersections showing true differences, rate of false-positive results, and generalizability to independent data compared to an established approach (3) of backward variable elimination according to Bayesian information criterion (BE-BIC) combined with splines. R software has been used to simulate the covariates age, sex, body mass index, education, and diabetes to examine their association with a continuous frailty score for osteoporosis using multiple linear regression. In setting 1, none of the covariates was associated with the frailty score, that is, only noise is present in the data. In setting 2, the covariates age, sex, and their interaction were associated with the frailty score, such that only females above 55 years formed an intersection associated with an increased frailty score. All approaches were compared under varying sample sizes (N = 200–3000) and signal-to-noise ratios (SNRs, 0.5–4) in 1000 replications. For model evaluation, bootstrap resampling was used. The models were fitted in internal learning data and then used to predict outcomes in the internal validation data. The mean squared error (MSE) was used for comparison and the frequency of false-positive findings calculated. In setting 1, approaches 1 and 2 generated spurious effects in more than 90% of simulations across all sample sizes. In a smaller sample size, approach 3 (BE-BIC) selected 36.5% of the correct model, in larger sample size in 89.8% and always had a lower number of spurious effects. MSE in independent data was generally higher for approaches 1 and 2 when compared to 3. In setting 2, approach 1 selected most frequently the correct interaction but frequently showed spurious effects (>75%). Across all sample sizes and SNR, approach 3 generated least often spurious results and had lowest MSE in independent data. Categorization of continuous covariates is detrimental to studies on intersectionality. Due to high and unrestricted model complexity, such approaches are prone to spurious effects and often lack interpretability. Approach 3 (BE-BIC) is considerably more robust against spurious findings, showed better generalizability to independent data, and can be used with most statistical software. For intersectionality research, we consider it most important to describe relevant differences between intersections and to avoid nonreproducible and spurious findings.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0895-4356
1878-5921
1878-5921
DOI:10.1016/j.jclinepi.2024.111368