Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to m...

Full description

Saved in:

Bibliographic Details
Published in	Behavior research methods Vol. 56; no. 6; pp. 5557 - 5587
Main Authors	Houghton, Zachary N., Kapatsinski, Vsevolod
Format	Journal Article
Language	English
Published	New York Springer US 01.09.2024 Springer Nature B.V
Subjects	Behavioral Science and Psychology Cognitive Psychology Computer Simulation Data Interpretation, Statistical Humans Logistic Models Models, Statistical Original Manuscript Population studies Psycholinguistics - methods Psychology Regression analysis Statistical models Logistic regression Random effects Conditional inference trees Misspecification Mixed-effects models Individual differences
Online Access	Get full text
ISSN	1554-3528 1554-3528
DOI	10.3758/s13428-023-02287-y

Cover

More Information
Summary:	With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1554-3528 1554-3528
DOI:	10.3758/s13428-023-02287-y