CHOIR improves significance-based detection of cell types and states from single-cell data
Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and ofte...
Saved in:
| Published in | Nature genetics Vol. 57; no. 5; pp. 1309 - 1319 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Nature Publishing Group US
01.05.2025
Nature Publishing Group |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1061-4036 1546-1718 1546-1718 |
| DOI | 10.1038/s41588-025-02148-8 |
Cover
| Summary: | Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (
c
luster
h
ierarchy
o
ptimization by
i
terative
r
andom forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine clusters representing distinct populations. We demonstrate the performance of CHOIR through extensive benchmarking against 15 existing clustering methods across 230 simulated and five real single-cell RNA sequencing, assay for transposase-accessible chromatin sequencing, spatial transcriptomic and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable and robust solution to the challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.
Cluster hierarchy optimization by iterative random forests (CHOIR) offers a robust and accurate method to identify cell clusters across a variety of single-cell resolution data with statistical support. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1061-4036 1546-1718 1546-1718 |
| DOI: | 10.1038/s41588-025-02148-8 |