Patterns of differential expression by association in omic data using a new measure based on ensemble learning

The ongoing development of high-throughput technologies is allowing the simultaneous monitoring of the expression levels for hundreds or thousands of biological inputs with the proliferation of what has been coined as omic data sources. One relevant issue when analyzing such data sources is concerne...

Full description

Saved in:
Bibliographic Details
Published inStatistical applications in genetics and molecular biology Vol. 22; no. 1
Main Authors Arevalillo, Jorge M., Martin-Arevalillo, Raquel
Format Journal Article
LanguageEnglish
Published Germany De Gruyter 01.01.2023
Walter de Gruyter GmbH
Subjects
Online AccessGet full text
ISSN2194-6302
1544-6115
1544-6115
DOI10.1515/sagmb-2023-0009

Cover

More Information
Summary:The ongoing development of high-throughput technologies is allowing the simultaneous monitoring of the expression levels for hundreds or thousands of biological inputs with the proliferation of what has been coined as omic data sources. One relevant issue when analyzing such data sources is concerned with the detection of differential expression across two experimental conditions, clinical status or two classes of a biological outcome. While a great deal of univariate data analysis approaches have been developed to address the issue, strategies for assessing interaction patterns of differential expression are scarce in the literature and have been limited to ad hoc solutions. This paper contributes to the problem by exploiting the facilities of an ensemble learning algorithm like random forests to propose a measure that assesses the differential expression explained by the interaction of the omic variables so subtle biological patterns may be uncovered as a result. The out of bag error rate, which is an estimate of the predictive accuracy of a random forests classifier, is used as a by-product to propose a new measure that assesses interaction patterns of differential expression. Its performance is studied in synthetic scenarios and it is also applied to real studies on SARS-CoV-2 and colon cancer data where it uncovers associations that remain undetected by other methods. Our proposal is aimed at providing a novel approach that may help the experts in biomedical and life sciences to unravel insightful interaction patterns that may decipher the molecular mechanisms underlying biological and clinical outcomes.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2194-6302
1544-6115
1544-6115
DOI:10.1515/sagmb-2023-0009