Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to m...

Full description

Saved in:
Bibliographic Details
Published inBehavior research methods Vol. 56; no. 6; pp. 5557 - 5587
Main Authors Houghton, Zachary N., Kapatsinski, Vsevolod
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1554-3528
1554-3528
DOI10.3758/s13428-023-02287-y

Cover

More Information
Summary:With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1554-3528
1554-3528
DOI:10.3758/s13428-023-02287-y