GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framew...

Full description

Saved in:
Bibliographic Details
Published inGenome Biology Vol. 24; no. 1; p. 76
Main Authors Zhang, Liubin, Yuan, Yangyang, Peng, Wenjie, Tang, Bin, Li, Mulin Jun, Gui, Hongsheng, Wang, Qiang, Li, Miaoxin
Format Journal Article
LanguageEnglish
Published London BioMed Central 17.04.2023
Springer Nature B.V
BMC
Subjects
Online AccessGet full text
ISSN1474-760X
1474-7596
1474-760X
DOI10.1186/s13059-023-02906-z

Cover

More Information
Summary:Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1474-760X
1474-7596
1474-760X
DOI:10.1186/s13059-023-02906-z