GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framew...

Full description

Saved in:

Bibliographic Details
Published in	Genome Biology Vol. 24; no. 1; p. 76
Main Authors	Zhang, Liubin, Yuan, Yangyang, Peng, Wenjie, Tang, Bin, Li, Mulin Jun, Gui, Hongsheng, Wang, Qiang, Li, Miaoxin
Format	Journal Article
Language	English
Published	London BioMed Central 17.04.2023 Springer Nature B.V BMC
Subjects	Algorithms Animal Genetics and Genomics Bioinformatics Biomedical and Life Sciences Byte-encoding genotypes Chromosomes Compression Data Compression - methods Datasets Design Evolutionary Biology genome Genomes Genomics Genomics - methods Genotype Genotype & phenotype Genotype compression Genotype management Genotypes Highly addressable genotype blocks Human Genetics Humans Large-scale genotypes Life Sciences Localization memory Method Microbial Genetics and Genomics Parallelization algorithm Plant Genetics and Genomics Software species Whole genome sequencing Parallelization algorithm Cloud computation Genotype compression Highly addressable genotype blocks Genotype management Byte-encoding genotypes Large-scale genotypes
Online Access	Get full text
ISSN	1474-760X 1474-7596 1474-760X
DOI	10.1186/s13059-023-02906-z

Cover

More Information
Summary:	Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1474-760X 1474-7596 1474-760X
DOI:	10.1186/s13059-023-02906-z