Long-read sequence assembly: a technical evaluation in barley

Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequenc...

Full description

Saved in:
Bibliographic Details
Published inThe Plant cell Vol. 33; no. 6; pp. 1888 - 1906
Main Authors Mascher, Martin, Wicker, Thomas, Jenkins, Jerry, Plott, Christopher, Lux, Thomas, Koh, Chu Shin, Ens, Jennifer, Gundlach, Heidrun, Boston, Lori B, Tulpová, Zuzana, Holden, Samuel, Hernández-Pinzón, Inmaculada, Scholz, Uwe, Mayer, Klaus F X, Spannagl, Manuel, Pozniak, Curtis J, Sharpe, Andrew G, Šimková, Hana, Moscou, Matthew J, Grimwood, Jane, Schmutz, Jeremy, Stein, Nils
Format Journal Article
LanguageEnglish
Published England Oxford University Press 19.07.2021
Subjects
Online AccessGet full text
ISSN1040-4651
1532-298X
1532-298X
DOI10.1093/plcell/koab077

Cover

More Information
Summary:Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Senior author.
ISSN:1040-4651
1532-298X
1532-298X
DOI:10.1093/plcell/koab077