A Divide and Conquer Algorithm of Bayesian Density Estimation

ABSTRACT Datasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bay...

Full description

Saved in:
Bibliographic Details
Published inAustralian & New Zealand journal of statistics Vol. 67; no. 2; pp. 250 - 264
Main Author Su, Ya
Format Journal Article
LanguageEnglish
Published Hoboken Wiley Subscription Services, Inc 01.06.2025
Subjects
Online AccessGet full text
ISSN1369-1473
1467-842X
1467-842X
DOI10.1111/anzs.70008

Cover

More Information
Summary:ABSTRACT Datasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bayesian mixture modelling, including the infinite mixture case. The methodology can be generalised to other application problems where a Bayesian mixture model is adopted. The proposed prior on each machine or subgroup modifies the original prior on both mixing probabilities and the rest of parameters in the distributions being mixed. The ultimate estimator is obtained by taking the average of the posterior samples corresponding to the proposed prior on each subset. Despite the tremendous reduction in time thanks to data splitting, the posterior contraction rate of the proposed estimator stays the same (up to a log$$ \log $$ factor) as that using the original prior when the data is analysed as a whole. Simulation studies also justify the competency of the proposed method compared to the established WASP estimator in the finite‐dimension case. In addition, one of our simulations is performed in a shape‐constrained deconvolution context and reveals promising results. The application to a GWAS dataset reveals the advantage over a naive divide and conquer method that uses the original prior.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1369-1473
1467-842X
1467-842X
DOI:10.1111/anzs.70008