Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Background Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. Results In this work, we review the existing fast and memory-efficient P...

Full description

Saved in:

Bibliographic Details
Published in	Genome Biology Vol. 21; no. 1; p. 9
Main Authors	Tsuyuzaki, Koki, Sato, Hiroyuki, Sato, Kenta, Nikaido, Itoshi
Format	Journal Article
Language	English
Published	London BioMed Central 20.01.2020 Springer Nature B.V BMC
Subjects	Accuracy Algorithms Animal Genetics and Genomics Benchmarking Benchmarking Studies Bioinformatics Biomedical and Life Sciences Cell cycle Cellular heterogeneity Clustering Computer applications Data analysis data collection Datasets Dimension reduction Evolutionary Biology Gene expression genome Genomics guidelines Human Genetics Life Sciences memory Microbial Genetics and Genomics Online/incremental algorithm Pancreas Plant Genetics and Genomics Principal Component Analysis Randomized algorithm Ribonucleic acid RNA RNA-Seq - methods sequence analysis Single-Cell Analysis - methods Single-cell RNA-seq Software Julia Dimension reduction Cellular heterogeneity R Out-of-core Single-cell RNA-seq Sparse data format Online/incremental algorithm Randomized algorithm Principal component analysis Python
Online Access	Get full text
ISSN	1474-760X 1474-7596 1474-760X
DOI	10.1186/s13059-019-1900-3

Cover

More Information
Summary:	Background Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. Results In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. Conclusion We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1474-760X 1474-7596 1474-760X
DOI:	10.1186/s13059-019-1900-3