Bayesian Generalized Linear Models for Analyzing Compositional and Sub‐Compositional Microbiome Data via EM Algorithm
ABSTRACT The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log‐ratio transformations of compositional covariates to zero constraint on the sum of the correspond...
Saved in:
| Published in | Statistics in medicine Vol. 44; no. 7; pp. e70084 - n/a |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Hoboken, USA
John Wiley & Sons, Inc
30.03.2025
Wiley Subscription Services, Inc |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0277-6715 1097-0258 1097-0258 |
| DOI | 10.1002/sim.70084 |
Cover
| Summary: | ABSTRACT
The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log‐ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum‐to‐zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high‐dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub‐compositional microbiome data. Our model employs a spike‐and‐slab double‐exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high‐dimensional microbiome data. The sum‐to‐zero constraint is handled through soft‐centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation–maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM
https://github.com/nyiuab/BhGLM. |
|---|---|
| Bibliography: | The authors received no specific funding for this work. Funding ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 0277-6715 1097-0258 1097-0258 |
| DOI: | 10.1002/sim.70084 |