Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies

Abstract Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of seque...

Full description

Saved in:
Bibliographic Details
Published inMolecular biology and evolution Vol. 37; no. 5; pp. 1495 - 1507
Main Authors Zou, Zhengting, Zhang, Hongjiu, Guan, Yuanfang, Zhang, Jianzhi
Format Journal Article
LanguageEnglish
Published United States Oxford University Press 01.05.2020
Subjects
Online AccessGet full text
ISSN0737-4038
1537-1719
1537-1719
DOI10.1093/molbev/msz307

Cover

More Information
Summary:Abstract Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification or insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex nonlinear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. Furthermore, when combined with the quartet puzzling algorithm, residual network predictors can be used to reconstruct trees with more than four taxa. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl; last accessed January 3, 2020).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
Present address: Microsoft, Inc., Bellevue, WA
Zhengting Zou and Hongjiu Zhang contributed equally to this work.
ISSN:0737-4038
1537-1719
1537-1719
DOI:10.1093/molbev/msz307