A Divide and Conquer Algorithm of Bayesian Density Estimation
ABSTRACT Datasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bay...
Saved in:
| Published in | Australian & New Zealand journal of statistics Vol. 67; no. 2; pp. 250 - 264 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
Hoboken
Wiley Subscription Services, Inc
01.06.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1369-1473 1467-842X 1467-842X |
| DOI | 10.1111/anzs.70008 |
Cover
| Summary: | ABSTRACT
Datasets for statistical analysis become extremely large even when stored on one single machine with some difficulty. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bayesian mixture modelling, including the infinite mixture case. The methodology can be generalised to other application problems where a Bayesian mixture model is adopted. The proposed prior on each machine or subgroup modifies the original prior on both mixing probabilities and the rest of parameters in the distributions being mixed. The ultimate estimator is obtained by taking the average of the posterior samples corresponding to the proposed prior on each subset. Despite the tremendous reduction in time thanks to data splitting, the posterior contraction rate of the proposed estimator stays the same (up to a log$$ \log $$ factor) as that using the original prior when the data is analysed as a whole. Simulation studies also justify the competency of the proposed method compared to the established WASP estimator in the finite‐dimension case. In addition, one of our simulations is performed in a shape‐constrained deconvolution context and reveals promising results. The application to a GWAS dataset reveals the advantage over a naive divide and conquer method that uses the original prior. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1369-1473 1467-842X 1467-842X |
| DOI: | 10.1111/anzs.70008 |