HABiC: an algorithm based on the exact computation of the Kantorovich-Rubinstein optimizer for binary classification in transcriptomics
Machine learning analyses of molecular omics datasets largely drive the development of precision medicine in oncology, but mathematical challenges still hamper their application in the clinic. In particular, omics-based learning relies on high dimensional data with high degrees of freedom and multic...
Saved in:
| Published in | Bioinformatics (Oxford, England) Vol. 41; no. 6 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
England
Oxford Publishing Limited (England)
01.06.2025
Oxford University Press |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1367-4811 1367-4803 1367-4811 |
| DOI | 10.1093/bioinformatics/btaf310 |
Cover
| Summary: | Machine learning analyses of molecular omics datasets largely drive the development of precision medicine in oncology, but mathematical challenges still hamper their application in the clinic. In particular, omics-based learning relies on high dimensional data with high degrees of freedom and multicollinearity issues, requiring more tailored algorithms. Here, we have developed a prediction algorithm that relies on the 1-Wasserstein distance to better capture complex relationships between variables, and that is built on a decision rule based on the exact computation of the Kantorovich-Rubinstein optimizer to increase the algorithm precision. We explored dimension reduction and aggregation methods to improve its robustness. The exact method was compared with a neural network-based approximate method, as well as with standard Euclidean distance-based classifiers.
Experimental results on synthetic datasets with multiple scenarios of redundant/informative variables revealed that exact and approximate methods based on Wasserstein distance outperformed state-of-the-art algorithms when class information was spread across a large number of variables. When predicting clinical or biological outcomes from transcriptomics datasets, HABiC achieved consistently higher accuracy in most situations.
Python code for the HABiC classifier is available at https://github.com/chiaraco/HABiC. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1367-4811 1367-4803 1367-4811 |
| DOI: | 10.1093/bioinformatics/btaf310 |