Towards the genome-scale discovery of bivariate monotonic classifiers

Background Bivariate monotonic classifiers (BMCs) are based on pairs of input features. Like many other models used for machine learning, they can capture nonlinear patterns in high-dimensional data. At the same time, they are simple and easy to interpret. Until now, the use of BMCs on a genome scal...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 26; no. 1; pp. 228 - 32
Main Authors	Fourquet, Océane, Krejca, Martin S., Doerr, Carola, Schwikowski, Benno
Format	Journal Article
Language	English
Published	London BioMed Central 02.09.2025 Springer Nature B.V BMC
Subjects	Algorithms Bioinformatics Biomarkers Biomedical and Life Sciences Bivariate analysis Bivariate functions Classification Computational Biology/Bioinformatics Computer Appl. in Life Sciences Datasets Datasets as Topic Dengue - diagnosis Dengue - genetics Dengue fever Gene Expression Profiling - classification Gene Expression Profiling - statistics & numerical data Genes Genomes Glioblastoma Glioblastoma - genetics Humans Hypotheses Interpretability Leukemia - genetics Life Sciences Machine Learning Mathematical models Microarrays Monotonic functions Phenotypes Python Systems biology Bivariate functions Algorithms Systems biology Interpretability Monotonic functions Classification
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-025-06253-7

Cover

More Information
Summary:	Background Bivariate monotonic classifiers (BMCs) are based on pairs of input features. Like many other models used for machine learning, they can capture nonlinear patterns in high-dimensional data. At the same time, they are simple and easy to interpret. Until now, the use of BMCs on a genome scale was hampered by the high computational complexity of the search for pairs of features with a high leave-one-out performance estimate. Results We introduce the fastBMC algorithm, which drastically speeds up the identification of BMCs. The algorithm is based on a mathematical bound for the BMC performance estimate while maintaining optimality. We show empirically that fastBMC speeds up the computation by a factor of at least 15 already for a small number of features, compared to the traditional approach. For two of the three smaller biomedical datasets that we consider here, the resulting possibility of considering much larger sets of features translates into significantly improved classification performance. As an example of the high degree of interpretability of BMCs, we discuss a straightforward interpretation of a BMC glioblastoma survival predictor, an immediate novel biomedical hypothesis, options for biomedical validation, and treatment implications. In addition, we study the performance of fastBMC on a larger and well-known breast cancer dataset, validating the benefits of the BMCs for biomarker identification and biomedical hypothesis generation. Conclusion fastBMC enables the rapid construction of robust and interpretable ensemble models using BMC, facilitating the discovery of gene pairs predictive of relevant phenotypes and their interaction in that context. Availability We provide the first open-source implementation for learning BMCs, a Python implementation of fastBMC in particular, and Python code to reproduce the fastBMC results on real and simulated data in this paper, at https://github.com/oceanefrqt/fastBMC .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-025-06253-7