Fast analysis of scATAC-seq data using a predefined set of genomic regions [version 2; peer review: 2 approved]

Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We p...

Full description

Saved in:
Bibliographic Details
Published inF1000 research Vol. 9; p. 199
Main Authors Giansanti, Valentina, Tang, Ming, Cittaro, Davide
Format Journal Article
LanguageEnglish
Published England Faculty of 1000 Ltd 2020
F1000 Research Limited
F1000 Research Ltd
Subjects
Online AccessGet full text
ISSN2046-1402
2046-1402
DOI10.12688/f1000research.22731.2

Cover

More Information
Summary:Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.
Bibliography:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
No competing interests were disclosed.
ISSN:2046-1402
2046-1402
DOI:10.12688/f1000research.22731.2