Algorithm-agnostic significance testing in supervised learning with multimodal data

Abstract Motivation Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance,...

Full description

Saved in:
Bibliographic Details
Published inBriefings in bioinformatics Vol. 25; no. 6
Main Authors Kook, Lucas, Lundborg, Anton Rask
Format Journal Article
LanguageEnglish
Published England Oxford University Press 23.09.2024
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN1467-5463
1477-4054
1477-4054
DOI10.1093/bib/bbae475

Cover

More Information
Summary:Abstract Motivation Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance, random forests or neural networks, which impede the use of traditional variable significance tests. Results We address this problem by proposing the use of COvariance MEasure Tests (COMETs), which are calibrated and powerful tests that can be combined with any sufficiently predictive supervised learning algorithm. We apply COMETs to several high-dimensional, multimodal data sets to illustrate (i) variable significance testing for finding relevant mutations modulating drug-activity, (ii) modality selection for predicting survival in liver cancer patients with multiomics data, and (iii) modality selection with clinical features and medical imaging data. In all applications, COMETs yield results consistent with domain knowledge without requiring data-driven pre-processing, which may invalidate type I error control. These novel applications with high-dimensional multimodal data corroborate prior results on the power and robustness of COMETs for significance testing. Availability and implementation COMETs are implemented in the cometsR package available on CRAN and pycometsPython library available on GitHub. Source code for reproducing all results is available at https://github.com/LucasKook/comets. All data sets used in this work are openly available.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1467-5463
1477-4054
1477-4054
DOI:10.1093/bib/bbae475