Bayesian Generalized Linear Models for Analyzing Compositional and Sub‐Compositional Microbiome Data via EM Algorithm

ABSTRACT The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log‐ratio transformations of compositional covariates to zero constraint on the sum of the correspond...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 44; no. 7; pp. e70084 - n/a
Main Authors Zhang, Li, Ding, Zhenying, Cui, Jinhong, Zhou, Xiaoxiao, Yi, Nengjun
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 30.03.2025
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN0277-6715
1097-0258
1097-0258
DOI10.1002/sim.70084

Cover

More Information
Summary:ABSTRACT The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log‐ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum‐to‐zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high‐dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub‐compositional microbiome data. Our model employs a spike‐and‐slab double‐exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high‐dimensional microbiome data. The sum‐to‐zero constraint is handled through soft‐centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation–maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.
Bibliography:The authors received no specific funding for this work.
Funding
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0277-6715
1097-0258
1097-0258
DOI:10.1002/sim.70084