CHOIR improves significance-based detection of cell types and states from single-cell data

Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and ofte...

Full description

Saved in:
Bibliographic Details
Published inNature genetics Vol. 57; no. 5; pp. 1309 - 1319
Main Authors Sant, Cathrine, Mucke, Lennart, Corces, M. Ryan
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.05.2025
Nature Publishing Group
Subjects
Online AccessGet full text
ISSN1061-4036
1546-1718
1546-1718
DOI10.1038/s41588-025-02148-8

Cover

More Information
Summary:Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR ( c luster h ierarchy o ptimization by i terative r andom forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine clusters representing distinct populations. We demonstrate the performance of CHOIR through extensive benchmarking against 15 existing clustering methods across 230 simulated and five real single-cell RNA sequencing, assay for transposase-accessible chromatin sequencing, spatial transcriptomic and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable and robust solution to the challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data. Cluster hierarchy optimization by iterative random forests (CHOIR) offers a robust and accurate method to identify cell clusters across a variety of single-cell resolution data with statistical support.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1061-4036
1546-1718
1546-1718
DOI:10.1038/s41588-025-02148-8