A Divide and Conquer Algorithm of Bayesian Density Estimation

ABSTRACT Datasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bay...

Full description

Saved in:

Bibliographic Details
Published in	Australian & New Zealand journal of statistics Vol. 67; no. 2; pp. 250 - 264
Main Author	Su, Ya
Format	Journal Article
Language	English
Published	Hoboken Wiley Subscription Services, Inc 01.06.2025
Subjects	Bayesian analysis Bayesian density estimation Bayesian mixture model Datasets Density divide and conquer posterior contraction rate Statistical analysis Subgroups
Online Access	Get full text
ISSN	1369-1473 1467-842X 1467-842X
DOI	10.1111/anzs.70008

Cover

More Information
Summary:	ABSTRACT Datasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bayesian mixture modelling, including the infinite mixture case. The methodology can be generalised to other application problems where a Bayesian mixture model is adopted. The proposed prior on each machine or subgroup modifies the original prior on both mixing probabilities and the rest of parameters in the distributions being mixed. The ultimate estimator is obtained by taking the average of the posterior samples corresponding to the proposed prior on each subset. Despite the tremendous reduction in time thanks to data splitting, the posterior contraction rate of the proposed estimator stays the same (up to a log$$ \log $$ factor) as that using the original prior when the data is analysed as a whole. Simulation studies also justify the competency of the proposed method compared to the established WASP estimator in the finite‐dimension case. In addition, one of our simulations is performed in a shape‐constrained deconvolution context and reveals promising results. The application to a GWAS dataset reveals the advantage over a naive divide and conquer method that uses the original prior.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1369-1473 1467-842X 1467-842X
DOI:	10.1111/anzs.70008