Bayesian Generalized Linear Models for Analyzing Compositional and Sub‐Compositional Microbiome Data via EM Algorithm

ABSTRACT The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log‐ratio transformations of compositional covariates to zero constraint on the sum of the correspond...

Full description

Saved in:

Bibliographic Details
Published in	Statistics in medicine Vol. 44; no. 7; pp. e70084 - n/a
Main Authors	Zhang, Li, Ding, Zhenying, Cui, Jinhong, Zhou, Xiaoxiao, Yi, Nengjun
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 30.03.2025 Wiley Subscription Services, Inc
Subjects	Algorithms Bayes Theorem Bayesian GLMs compositional data Computer Simulation EM algorithm Generalized linear models Humans Linear Models Markov Chains microbiome Microbiota Monte Carlo Method spike‐and‐slab priors sum‐to‐zero constraint compositional data Bayesian GLMs spike‐and‐slab priors EM algorithm microbiome sum‐to‐zero constraint
Online Access	Get full text
ISSN	0277-6715 1097-0258 1097-0258
DOI	10.1002/sim.70084

Cover

More Information
Summary:	ABSTRACT The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log‐ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum‐to‐zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high‐dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub‐compositional microbiome data. Our model employs a spike‐and‐slab double‐exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high‐dimensional microbiome data. The sum‐to‐zero constraint is handled through soft‐centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation–maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.
Bibliography:	The authors received no specific funding for this work. Funding ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0277-6715 1097-0258 1097-0258
DOI:	10.1002/sim.70084