Fast tumor phylogeny regression via tree-structured dual dynamic programming

Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have ma...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics (Oxford, England) Vol. 41; no. Supplement_1; pp. i170 - i179
Main Authors	Schmidt, Henri, Qi, Yuanyuan, Raphael, Benjamin J, El-Kebir, Mohammed
Format	Journal Article
Language	English
Published	England Oxford Publishing Limited (England) 01.07.2025 Oxford University Press
Subjects	Algorithms Availability Colorectal carcinoma Computational Biology - methods Deoxyribonucleic acid DNA DNA sequencing Dynamic Programming Evolutionary, Comparative and Population Genomics Gene sequencing Humans Inference Line interfaces Neoplasms - genetics Phylogeny Reconstruction Regression Sequence Analysis, DNA - methods Software Topology Tumors
Online Access	Get full text
ISSN	1367-4803 1367-4811 1367-4811
DOI	10.1093/bioinformatics/btaf235

Cover

More Information
Summary:	Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms. Here, we introduce fastppm, a fast tool to solve the perfect phylogeny regression problem via tree-structured dual dynamic programming. fastppm supports arbitrary, separable convex loss functions including the ℓ2, piecewise linear, binomial and beta-binomial loss and provides asymptotic improvements for the ℓ2 and piecewise linear loss over existing algorithms. We find that fastppm empirically outperforms both specialized and general purpose regression algorithms, obtaining 50-450× speedups while providing as accurate solutions as existing approaches. Incorporating fastppm into several phylogeny inference algorithms immediately yields up to 400× speedups, requiring only a small change to the program code of existing software. Finally, fastppm enables analysis of low-coverage bulk DNA sequencing data on both simulated data and in a patient-derived mouse model of colorectal cancer, outperforming state-of-the-art phylogeny inference algorithms in terms of both accuracy and runtime. fastppm is implemented in C++ and available as both a command-line interface and Python library at github.com/elkebir-group/fastppm.git under an MIT license.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Henri Schmidt and Yuanyuan Qi equal contribution.
ISSN:	1367-4803 1367-4811 1367-4811
DOI:	10.1093/bioinformatics/btaf235