Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As...

Full description

Saved in:
Bibliographic Details
Published inGenomics, proteomics & bioinformatics Vol. 20; no. 1; pp. 205 - 218
Main Authors Lin, Jiadong, Yang, Xiaofei, Kosters, Walter, Xu, Tun, Jia, Yanyan, Wang, Songbo, Zhu, Qihui, Ryan, Mallory, Guo, Li, Zhang, Chengsheng, Gerstein, Mark B., Sanders, Ashley D., Zody, Micheal C., Talkowski, Michael E., Mills, Ryan E., Korbel, Jan O., Marschall, Tobias, Ebert, Peter, Audano, Peter A., Rodriguez-Martin, Bernardo, Porubsky, David, Jan Bonder, Marc, Sulovari, Arvis, Ebler, Jana, Zhou, Weichen, Serra Mari, Rebecca, Yilmaz, Feyza, Zhao, Xuefang, Hsieh, PingHsun, Lee, Joyce, Kumar, Sushant, Rausch, Tobias, Chen, Yu, Chong, Zechen, Munson, Katherine M., Chaisson, Mark J.P., Chen, Junjie, Shi, Xinghua, Wenger, Aaron M., Harvey, William T., Hansenfeld, Patrick, Regier, Allison, Hall, Ira M., Flicek, Paul, Hastie, Alex R., Fairely, Susan, Lee, Charles, Devine, Scott E., Eichler, Evan E., Ye, Kai
Format Journal Article
LanguageEnglish
Published China Elsevier B.V 01.02.2022
School of Computer Science and Technology,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China%Leiden Institute of Advanced Computer Science,Faculty of Science,Leiden University,Leiden 2311EZ,Netherland%School of Automation Science and Engineering,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China%The Jackson Laboratory for Genomic Medicine,Farmington,CT 06032,USA%MOE Key Lab for Intelligent Networks&Networks Security,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China%The Jackson Laboratory for Genomic Medicine,Farmington,CT 06032,USA
School of Automation Science and Engineering,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China
Genome Institute,the First Affiliated Hospital of Xi'an Jiaotong University,Xi'an 710061,China
Institute for Genome Sciences,University of Maryland School of Medicine,Baltimore,MD 21201,USA%Department of Genome Sciences,University of Washington School of Medicine,Seattle,WA 98119,USA
Leiden Institute of Advanced Computer Science,Faculty of Science,Leiden University,Leiden 2311EZ,Netherland%MOE Key Lab for Intelligent Networks&Networks Security,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China
Precision Medicine Center,the First Affiliated Hospital of Xi'an Jiaotong University,Xi'an 710061,China%School of Automation Science and Engineering,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China
The School of Life Science and Technology,Xi'an Jiaotong University,Xi'an 710049,China
Howard Hughes Medical Institute,University of Washington,Seattle,WA 98195,USA%School of Automation Science and Engineering,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China
MOE Key Lab for Intelligent Networks&Networks Security,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China
Elsevier
Oxford University Press
Subjects
Online AccessGet full text
ISSN1672-0229
2210-3244
2210-3244
DOI10.1016/j.gpb.2021.03.007

Cover

More Information
Summary:Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Equal contribution.
Consortium authors are enumerated at the end of this article.
ISSN:1672-0229
2210-3244
2210-3244
DOI:10.1016/j.gpb.2021.03.007