CHOIR improves significance-based detection of cell types and states from single-cell data

Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and ofte...

Full description

Saved in:

Bibliographic Details
Published in	Nature genetics Vol. 57; no. 5; pp. 1309 - 1319
Main Authors	Sant, Cathrine, Mucke, Lennart, Corces, M. Ryan
Format	Journal Article
Language	English
Published	New York Nature Publishing Group US 01.05.2025 Nature Publishing Group
Subjects	45/91 631/114/1314 631/114/794 Agriculture Algorithms Animal Genetics and Genomics Biomedical and Life Sciences Biomedicine Cancer Research Cells Choirs Chromatin Cluster Analysis Clustering Computational Biology - methods Datasets Gene expression Gene Expression Profiling - methods Gene Function Gene sequencing Human Genetics Humans Permutations Sequence Analysis, RNA - methods Single-Cell Analysis - methods Software Statistical inference technical-report Transcriptome - genetics Transcriptomics Transposase
Online Access	Get full text
ISSN	1061-4036 1546-1718 1546-1718
DOI	10.1038/s41588-025-02148-8

Cover

More Information
Summary:	Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR ( c luster h ierarchy o ptimization by i terative r andom forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine clusters representing distinct populations. We demonstrate the performance of CHOIR through extensive benchmarking against 15 existing clustering methods across 230 simulated and five real single-cell RNA sequencing, assay for transposase-accessible chromatin sequencing, spatial transcriptomic and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable and robust solution to the challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data. Cluster hierarchy optimization by iterative random forests (CHOIR) offers a robust and accurate method to identify cell clusters across a variety of single-cell resolution data with statistical support.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1061-4036 1546-1718 1546-1718
DOI:	10.1038/s41588-025-02148-8