Accurate assembly of multiple RNA-seq samples with Aletsch

Motivation High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorit...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics (Oxford, England) Vol. 40; no. Supplement_1; pp. i307 - i317
Main Authors	Shi, Qian, Zhang, Qimin, Shao, Mingfu
Format	Journal Article
Language	English
Published	England Oxford University Press 28.06.2024 Oxford Publishing Limited (England)
Subjects	Adaptability Algorithms Apexes Assembling Availability Chromosomes Datasets Decoding Decomposition Gene sequencing Graph theory High-Throughput Nucleotide Sequencing - methods Humans Ribonucleic acid RNA RNA-Seq - methods Sequence Analysis, RNA - methods Software
Online Access	Get full text
ISSN	1367-4803 1367-4811 1367-4811
DOI	10.1093/bioinformatics/btae215

Cover

More Information
Summary:	Motivation High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. Results We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a “bridging” system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages “supporting” information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch’s significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%–62.1% and PsiCLASS by 23.0%–175.5% on human datasets. Availability and implementation Aletsch is freely available at https://github.com/Shao-Group/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https://github.com/Shao-Group/aletsch-test.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1367-4803 1367-4811 1367-4811
DOI:	10.1093/bioinformatics/btae215