Pseudobulk with proper offsets has the same statistical properties as generalized linear mixed models in single-cell case-control studies
Generalized linear mixed models (GLMMs), such as the negative-binomial or Poisson linear mixed model, are widely applied to single-cell RNA sequencing data to compare transcript expression between different conditions determined at the subject level. However, the model is computationally intensive,...
Saved in:
Published in | Bioinformatics (Oxford, England) Vol. 40; no. 8 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
England
02.08.2024
|
Online Access | Get full text |
ISSN | 1367-4811 1367-4803 1367-4811 |
DOI | 10.1093/bioinformatics/btae498 |
Cover
Summary: | Generalized linear mixed models (GLMMs), such as the negative-binomial or Poisson linear mixed model, are widely applied to single-cell RNA sequencing data to compare transcript expression between different conditions determined at the subject level. However, the model is computationally intensive, and its relative statistical performance to pseudobulk approaches is poorly understood.
We propose offset-pseudobulk as a lightweight alternative to GLMMs. We prove that a count-based pseudobulk equipped with a proper offset variable has the same statistical properties as GLMMs in terms of both point estimates and standard errors. We confirm our findings using simulations based on real data. Offset-pseudobulk is substantially faster (>x10) and numerically more stable than GLMMs.
Offset pseudobulk can be easily implemented in any generalized linear model software by tweaking a few options. The codes can be found at https://github.com/hanbin973/pseudobulk_is_mm.
Supplementary data are available at Bioinformatics online. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1367-4811 1367-4803 1367-4811 |
DOI: | 10.1093/bioinformatics/btae498 |