Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved]

Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We p...

Full description

Saved in:

Bibliographic Details
Published in	F1000 research Vol. 9; p. 199
Main Authors	Giansanti, Valentina, Tang, Ming, Cittaro, Davide
Format	Journal Article
Language	English
Published	England Faculty of 1000 Ltd 2020 F1000 Research Limited F1000 Research Ltd
Subjects	Computational Biology Computer applications Deoxyribonuclease Genome Genomes Genomics Genomics - methods Humans K562 Cells Labeling Leukocytes, Mononuclear Method Peripheral blood mononuclear cells Sequence Alignment Sequence Analysis, DNA single cell scATAC-seq pseudoalignment
Online Access	Get full text
ISSN	2046-1402 2046-1402
DOI	10.12688/f1000research.22731.2

Cover

More Information
Summary:	Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.
Bibliography:	new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 No competing interests were disclosed.
ISSN:	2046-1402 2046-1402
DOI:	10.12688/f1000research.22731.2