How accurate are Bayes factor-based null hypothesis tests? A simulation study

Bayes factor null hypothesis tests provide a viable alternative to frequentist measures of evidence quantification. Bayes factors for realistic data sets in areas like psychology cannot be calculated exactly and require numerical approximations to complex integrals. Crucially, the accuracy of these...

Full description

Saved in:

Bibliographic Details
Main Authors	Schad, Daniel J, Modrák, Martin, Vasishth, Shravan
Format	Journal Article
Language	English
Published	12.06.2024
Subjects	Statistics - Methodology
Online Access	Get full text
DOI	10.48550/arxiv.2406.08022

Cover

More Information
Summary:	Bayes factor null hypothesis tests provide a viable alternative to frequentist measures of evidence quantification. Bayes factors for realistic data sets in areas like psychology cannot be calculated exactly and require numerical approximations to complex integrals. Crucially, the accuracy of these approximations, i.e., whether an approximate Bayes factor corresponds to the exact Bayes factor, is unknown, and may depend on data, prior, and likelihood. We have recently developed a novel statistical procedure, namely marginal simulation-based calibration (SBC) for Bayes factors, to test whether the computed Bayes factors for a given analysis are accurate. Here, we use marginal SBC for Bayes factors and calibration plots to test for some common cognitive designs, whether Bayes factors are calculated accurately. We use the bridgesampling/brms packages in R. We run analyses for three commonly used designs in psychology and psycholinguistics: (a) a design with random effects for subjects only, (b) a Latin square design with crossed random effects for subjects and items, but a single fixed-factor, and (c) a Latin square 2x2 design with crossed random effects for subjects and items. We find that Bayes factor estimates turn out accurate in cases when the bridgesampling algorithm does not issue a warning message, but can be biased and liberal when a warning message is shown. These results support the use of brms/bridgesampling for null hypothesis Bayes factor tests in commonly used factorial designs. They also suggest that when a warning message is issued, Bayes factor results should not be trusted. The results show that it is practical to check whether Bayes factors are computed correctly.
DOI:	10.48550/arxiv.2406.08022