Fast tumor phylogeny regression via tree-structured dual dynamic programming

Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have ma...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 41; no. Supplement_1; pp. i170 - i179
Main Authors Schmidt, Henri, Qi, Yuanyuan, Raphael, Benjamin J, El-Kebir, Mohammed
Format Journal Article
LanguageEnglish
Published England Oxford Publishing Limited (England) 01.07.2025
Oxford University Press
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1367-4811
DOI10.1093/bioinformatics/btaf235

Cover

More Information
Summary:Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms. Here, we introduce fastppm, a fast tool to solve the perfect phylogeny regression problem via tree-structured dual dynamic programming. fastppm supports arbitrary, separable convex loss functions including the ℓ2, piecewise linear, binomial and beta-binomial loss and provides asymptotic improvements for the ℓ2 and piecewise linear loss over existing algorithms. We find that fastppm empirically outperforms both specialized and general purpose regression algorithms, obtaining 50-450× speedups while providing as accurate solutions as existing approaches. Incorporating fastppm into several phylogeny inference algorithms immediately yields up to 400× speedups, requiring only a small change to the program code of existing software. Finally, fastppm enables analysis of low-coverage bulk DNA sequencing data on both simulated data and in a patient-derived mouse model of colorectal cancer, outperforming state-of-the-art phylogeny inference algorithms in terms of both accuracy and runtime. fastppm is implemented in C++ and available as both a command-line interface and Python library at github.com/elkebir-group/fastppm.git under an MIT license.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Henri Schmidt and Yuanyuan Qi equal contribution.
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btaf235