A comparative analysis of gene expression profiling by statistical and machine learning approaches

Abstract Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by th...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics advances Vol. 5; no. 1; p. vbae199
Main Authors Bontonou, Myriam, Haget, Anaïs, Boulougouri, Maria, Audit, Benjamin, Borgnat, Pierre, Arbona, Jean-Michel
Format Journal Article
LanguageEnglish
Published England Oxford University Press 2025
Oxford academic
Subjects
Online AccessGet full text
ISSN2635-0041
2635-0041
DOI10.1093/bioadv/vbae199

Cover

Abstract Abstract Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example. Results Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain. Availability and implementation Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.
AbstractList Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example. Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain. Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.
Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.MotivationMany machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.ResultsExperiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.Availability and implementationPython code and datasets are available at https://github.com/mbonto/XAI_in_genomics.
Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for their decisions. These explanations often take the form of a list of genes ranked in order of importance for the predictions, the highest-ranked genes being interpreted as linked to the phenotype. We discuss the biological and the methodological limitations of such explanations. Experiments are performed on several datasets gathering cancer and healthy tissue samples from the TCGA, GTEx and TARGET databases. A collection of machine learning models including logistic regression, multilayer perceptron, and graph neural network are trained to classify samples according to their cancer type. Gene rankings are obtained from explainability methods adapted to these models, and compared to the ones from classical statistical feature selection methods such as mutual information, DESeq2, and EdgeR. Interestingly, on simple tasks, we observe that the information learned by black-box neural networks is related to the notion of differential expression. In all cases, a small set containing the best-ranked genes is sufficient to achieve a good classification. However, these genes differ significantly between the methods and similar classification performance can be achieved with numerous lower ranked genes. In conclusion, although these methods enable the identification of biomarkers characteristic of certain pathologies, our results question the completeness of the selected gene sets and thus of explainability by the identification of the underlying biological processes.
Abstract Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example. Results Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain. Availability and implementation Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.
Author Haget, Anaïs
Bontonou, Myriam
Arbona, Jean-Michel
Boulougouri, Maria
Borgnat, Pierre
Audit, Benjamin
Author_xml – sequence: 1
  givenname: Myriam
  orcidid: 0000-0002-0010-5457
  surname: Bontonou
  fullname: Bontonou, Myriam
– sequence: 2
  givenname: Anaïs
  surname: Haget
  fullname: Haget, Anaïs
– sequence: 3
  givenname: Maria
  surname: Boulougouri
  fullname: Boulougouri, Maria
– sequence: 4
  givenname: Benjamin
  orcidid: 0000-0003-2683-9990
  surname: Audit
  fullname: Audit, Benjamin
  email: benjamin.audit@ens-lyon.fr
– sequence: 5
  givenname: Pierre
  orcidid: 0000-0003-4536-8354
  surname: Borgnat
  fullname: Borgnat, Pierre
  email: pierre.borgnat@ens-lyon.fr
– sequence: 6
  givenname: Jean-Michel
  orcidid: 0000-0001-6166-9056
  surname: Arbona
  fullname: Arbona, Jean-Michel
  email: jean-michel.arbona@ens-lyon.fr
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39897946$$D View this record in MEDLINE/PubMed
https://hal.science/hal-04731873$$DView record in HAL
BookMark eNqF0T1PwzAQBmALFUEprIzIIwwFO07seKwQX1IlFpitc3JpjZI4xGlF_z2uWgobk63Tc6fTvWdk1PoWCbnk7JYzLe6s81Cu79YWkGt9RMaJFNmUsZSP_vxPyUUIH4yxRCnJU3FCToXOtdKpHBM7o4VvOuhhcGuk0EK9CS5QX9EFtkjxq-sxBOdb2vW-crVrF9RuaBhiQxhcAXVsKmkDxdJFXyP07dZAF30sYjgnxxXUAS_274S8Pz683T9P569PL_ez-bQQWT5MdZVlZZbLXMiCo1S2shLKtJSJTrIktxUKQCskl1pIpjQHy0soC5AZy1WlxITc7OYuoTZd7xroN8aDM8-zudnWWKoEz5VYJ9Fe72xc8nOFYTCNCwXWNbToV8EILpM84zJeeUKu9nRlGywPk39uGMHtDhS9D6HH6kA4M9uczC4ns8_pd1G_6v6z32W8lg0
Cites_doi 10.1371/journal.pcbi.1002375
10.1093/jnci/dju049
10.1186/s13059-014-0550-8
10.1186/gb-2002-3-12-research0069
10.1093/bib/bbae027
10.1101/gr.268581.120
10.1016/j.patter.2021.100213
10.1016/j.cell.2018.03.022
10.1214/13-EJS815
10.1006/mgme.2001.3193
10.1038/nbt.3772
10.1109/TPAMI.2004.55
10.1016/j.aiopen.2021.01.001
10.1186/s12864-023-09571-3
10.1371/journal.pone.0190152
10.1016/j.cels.2019.06.006
10.1200/JCO.2008.18.1370
10.1186/s12859-022-04807-7
10.1186/s13059-022-02739-2
10.1186/s12859-023-05273-5
10.1371/journal.pone.0087357
10.3389/fgene.2020.603808
10.1109/TPAMI.2007.1115
10.3390/bioengineering10020173
10.1038/ng.2653
10.1186/s12864-017-3906-0
10.1371/journal.pcbi.1002240
10.1016/S0140-6736(05)17866-0
10.1126/science.286.5439.531
10.1093/bioinformatics/btp616
10.1017/CBO9780511804441
10.3389/fphy.2020.00203
10.1186/gb-2000-1-2-research0003
10.1093/bioinformatics/btr260
10.1038/s41587-020-0546-8
10.1056/NEJMp1607591
10.1016/S0004-3702(97)00043-X
10.1093/bioinformatics/18.1.39
10.1016/j.cels.2015.12.004
10.1186/s12859-021-04370-7
ContentType Journal Article
Copyright The Author(s) 2024. Published by Oxford University Press. 2024
The Author(s) 2024. Published by Oxford University Press.
Attribution
Copyright_xml – notice: The Author(s) 2024. Published by Oxford University Press. 2024
– notice: The Author(s) 2024. Published by Oxford University Press.
– notice: Attribution
DBID TOX
AAYXX
CITATION
NPM
7X8
1XC
VOOES
DOI 10.1093/bioadv/vbae199
DatabaseName Oxford Journals Open Access Collection
CrossRef
PubMed
MEDLINE - Academic
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic


Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Biology
Computer Science
EISSN 2635-0041
ExternalDocumentID oai_HAL_hal_04731873v2
39897946
10_1093_bioadv_vbae199
10.1093/bioadv/vbae199
Genre Journal Article
GroupedDBID 0R~
ABDBF
ABEJV
ABGNP
ABXVV
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AMNDL
BBNVY
BENPR
BHPHI
CCPQU
GROUPED_DOAJ
HCIFZ
M7P
M~E
OK1
PHGZM
PHGZT
PIMPY
PQGLB
RPM
TOX
ZCN
AAYXX
CITATION
NPM
7X8
1XC
VOOES
ID FETCH-LOGICAL-c358t-9f55d586836c1e67bfb6ad4d6292528bfe3aeb36169360791ab1dadca65087f73
IEDL.DBID TOX
ISSN 2635-0041
IngestDate Thu Sep 25 06:52:42 EDT 2025
Fri Sep 05 13:52:39 EDT 2025
Mon Jul 21 05:47:46 EDT 2025
Tue Aug 05 12:10:22 EDT 2025
Mon Sep 22 07:30:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Genomics (q-bio.GN)
FOS: Biological sciences
FOS: Computer and information sciences
Machine Learning (cs.LG)
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2024. Published by Oxford University Press.
Attribution: http://creativecommons.org/licenses/by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c358t-9f55d586836c1e67bfb6ad4d6292528bfe3aeb36169360791ab1dadca65087f73
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-2683-9990
0000-0003-4536-8354
0000-0002-0010-5457
0000-0001-6166-9056
OpenAccessLink https://dx.doi.org/10.1093/bioadv/vbae199
PMID 39897946
PQID 3162851609
PQPubID 23479
ParticipantIDs hal_primary_oai_HAL_hal_04731873v2
proquest_miscellaneous_3162851609
pubmed_primary_39897946
crossref_primary_10_1093_bioadv_vbae199
oup_primary_10_1093_bioadv_vbae199
PublicationCentury 2000
PublicationDate 2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025-00-00
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics advances
PublicationTitleAlternate Bioinform Adv
PublicationYear 2025
Publisher Oxford University Press
Oxford academic
Publisher_xml – name: Oxford University Press
– name: Oxford academic
References Ahn (2025061901530233500_vbae199-B2) 2018
Michiels (2025061901530233500_vbae199-B39) 2005; 365
Quemener (2025061901530233500_vbae199-B45) 2013; 2013
Grossman (2025061901530233500_vbae199-B18) 2016; 375
Li (2025061901530233500_vbae199-B32) 2017; 18
Srivastava (2025061901530233500_vbae199-B52) 2014; 15
Kimmel (2025061901530233500_vbae199-B25) 2021; 31
Liberzon (2025061901530233500_vbae199-B34) 2015; 1
Nguyen (2025061901530233500_vbae199-B41) 2002; 18
Kipf (2025061901530233500_vbae199-B27) 2017
Liberzon (2025061901530233500_vbae199-B33) 2011; 27
Waldron (2025061901530233500_vbae199-B58) 2014; 106
Zhou (2025061901530233500_vbae199-B61) 2020; 1
Agarwal (2025061901530233500_vbae199-B1) 2022; 35
Costa-Silva (2025061901530233500_vbae199-B11) 2017; 12
Xiong (2025061901530233500_vbae199-B59) 2001; 73
Paszke (2025061901530233500_vbae199-B43) 2019
Brouard (2025061901530233500_vbae199-B7) 2024; 25
Jolliffe (2025061901530233500_vbae199-B23) 2002
Robinson (2025061901530233500_vbae199-B48) 2010; 26
Rohimat (2025061901530233500_vbae199-B49) 2022
Ross (2025061901530233500_vbae199-B50) 2014; 9
Bourgeais (2025061901530233500_vbae199-B5) 2021; 22
Molnar (2025061901530233500_vbae199-B40) 2022
Hanczar (2025061901530233500_vbae199-B19) 2022; 23
Kingma (2025061901530233500_vbae199-B26) 2015
Venet (2025061901530233500_vbae199-B56) 2011; 7
Leng (2025061901530233500_vbae199-B31) 2022; 23
Parker (2025061901530233500_vbae199-B42) 2009; 27
Hastie (2025061901530233500_vbae199-B20) 2000; 1
Kokhlikyan (2025061901530233500_vbae199-B29) 2020
Alharbi (2025061901530233500_vbae199-B3) 2023; 10
Gao (2025061901530233500_vbae199-B14) 2019; 9
Kohavi (2025061901530233500_vbae199-B28) 1997; 97
Mahendran (2025061901530233500_vbae199-B38) 2020; 11
Yu (2025061901530233500_vbae199-B60) 2004; 5
Boyd (2025061901530233500_vbae199-B6) 2004
Chen (2025061901530233500_vbae199-B8) 2016
Ramirez (2025061901530233500_vbae199-B46) 2020; 8
Love (2025061901530233500_vbae199-B36) 2014; 15
Rhee (2025061901530233500_vbae199-B47) 2018
Pedregosa (2025061901530233500_vbae199-B44) 2011; 12
Sundararajan (2025061901530233500_vbae199-B53) 2017
Dettling (2025061901530233500_vbae199-B12) 2002; 3
Rudin (2025061901530233500_vbae199-B51) 2022; 16
Goldman (2025061901530233500_vbae199-B15) 2020; 38
Jacquet (2025061901530233500_vbae199-B22) 2023; 24
Choi (2025061901530233500_vbae199-B9) 2023; 24
Goodfellow (2025061901530233500_vbae199-B17) 2016
Vivian (2025061901530233500_vbae199-B57) 2017; 35
Hoadley (2025061901530233500_vbae199-B21) 2018; 173
Krishnapuram (2025061901530233500_vbae199-B30) 2004; 26
Clarke (2025061901530233500_vbae199-B10) 2021; 2
Golub (2025061901530233500_vbae199-B16) 1999; 286
Lonsdale (2025061901530233500_vbae199-B35) 2013; 45
Khatri (2025061901530233500_vbae199-B24) 2012; 8
Lundberg (2025061901530233500_vbae199-B37) 2017
Tibshirani (2025061901530233500_vbae199-B54) 2013; 7
Tomczak (2025061901530233500_vbae199-B55) 2015; 19
Dhillon (2025061901530233500_vbae199-B13) 2007; 29
Bishop (2025061901530233500_vbae199-B4) 2006
References_xml – year: 2022
  ident: 2025061901530233500_vbae199-B49
– volume: 8
  start-page: e1002375
  year: 2012
  ident: 2025061901530233500_vbae199-B24
  article-title: Ten years of pathway analysis: current approaches and outstanding challenges
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1002375
– volume: 12
  start-page: 2825
  year: 2011
  ident: 2025061901530233500_vbae199-B44
  article-title: Scikit-learn: machine learning in python
  publication-title: JMLR
– year: 2017
  ident: 2025061901530233500_vbae199-B53
– year: 2020
  ident: 2025061901530233500_vbae199-B29
– volume: 106
  start-page: dju049
  year: 2014
  ident: 2025061901530233500_vbae199-B58
  article-title: Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer
  publication-title: J Natl Cancer Inst
  doi: 10.1093/jnci/dju049
– volume: 15
  start-page: 550
  year: 2014
  ident: 2025061901530233500_vbae199-B36
  article-title: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
  publication-title: Genome Biol
  doi: 10.1186/s13059-014-0550-8
– volume: 3
  start-page: RESEARCH0069
  year: 2002
  ident: 2025061901530233500_vbae199-B12
  article-title: Supervised clustering of genes
  publication-title: Genome Biol
  doi: 10.1186/gb-2002-3-12-research0069
– volume: 25
  start-page: bbae027
  year: 2024
  ident: 2025061901530233500_vbae199-B7
  article-title: Should we really use graph neural networks for transcriptomic prediction?
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbae027
– volume: 19
  start-page: A68
  year: 2015
  ident: 2025061901530233500_vbae199-B55
  article-title: Review the cancer genome atlas (TCGA): an immeasurable source of knowledge
  publication-title: Contemp Oncol (Pozn)
– volume: 5
  start-page: 1205
  year: 2004
  ident: 2025061901530233500_vbae199-B60
  article-title: H. Efficient feature selection via analysis of relevance and redundancy
  publication-title: J Mach Learn Res
– volume: 31
  start-page: 1781
  year: 2021
  ident: 2025061901530233500_vbae199-B25
  article-title: Semisupervised adversarial neural networks for single-cell classification
  publication-title: Genome Res
  doi: 10.1101/gr.268581.120
– volume: 2
  start-page: 100213
  year: 2021
  ident: 2025061901530233500_vbae199-B10
  article-title: Appyters: turning jupyter notebooks into data-driven web apps
  publication-title: Patterns (N Y)
  doi: 10.1016/j.patter.2021.100213
– year: 2022
  ident: 2025061901530233500_vbae199-B40
– start-page: 1748
  year: 2018
  ident: 2025061901530233500_vbae199-B2
– volume: 173
  start-page: 291
  year: 2018
  ident: 2025061901530233500_vbae199-B21
  article-title: Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer
  publication-title: Cell
  doi: 10.1016/j.cell.2018.03.022
– volume-title: Deep Learning
  year: 2016
  ident: 2025061901530233500_vbae199-B17
– volume: 7
  start-page: 1456
  year: 2013
  ident: 2025061901530233500_vbae199-B54
  article-title: The lasso problem and uniqueness
  publication-title: Electron J Statist
  doi: 10.1214/13-EJS815
– volume: 73
  start-page: 239
  year: 2001
  ident: 2025061901530233500_vbae199-B59
  article-title: Feature (gene) selection in gene expression-based tumor classification
  publication-title: Mol Genet Metab
  doi: 10.1006/mgme.2001.3193
– volume: 35
  start-page: 314
  year: 2017
  ident: 2025061901530233500_vbae199-B57
  article-title: Toil enables reproducible, open source, big biomedical data analyses
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.3772
– volume: 26
  start-page: 1105
  year: 2004
  ident: 2025061901530233500_vbae199-B30
  article-title: A Bayesian approach to joint feature selection and classifier design
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2004.55
– volume: 1
  start-page: 57
  year: 2020
  ident: 2025061901530233500_vbae199-B61
  article-title: Graph neural networks: a review of methods and applications
  publication-title: AI Open
  doi: 10.1016/j.aiopen.2021.01.001
– volume: 24
  start-page: 463
  year: 2023
  ident: 2025061901530233500_vbae199-B22
  article-title: Aberrant activation of five embryonic stem cell-specific genes robustly predicts a high risk of relapse in breast cancers
  publication-title: BMC Genomics
  doi: 10.1186/s12864-023-09571-3
– volume: 12
  start-page: e0190152
  year: 2017
  ident: 2025061901530233500_vbae199-B11
  article-title: Rna-seq differential expression analysis: an extended review and a software tool
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0190152
– volume: 9
  start-page: 24
  year: 2019
  ident: 2025061901530233500_vbae199-B14
  article-title: Before and after: comparison of legacy and harmonized TCGA genomic data commons’ data
  publication-title: Cell Syst
  doi: 10.1016/j.cels.2019.06.006
– volume: 27
  start-page: 1160
  year: 2009
  ident: 2025061901530233500_vbae199-B42
  article-title: Supervised risk predictor of breast cancer based on intrinsic subtypes
  publication-title: J Clin Oncol
  doi: 10.1200/JCO.2008.18.1370
– volume-title: Principal Component Analysis
  year: 2002
  ident: 2025061901530233500_vbae199-B23
– volume: 23
  start-page: 262
  year: 2022
  ident: 2025061901530233500_vbae199-B19
  article-title: Assessment of deep learning and transfer learning for cancer prediction based on gene expression data
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-022-04807-7
– volume: 23
  start-page: 171
  year: 2022
  ident: 2025061901530233500_vbae199-B31
  article-title: A benchmark study of deep learning-based multi-omics data fusion methods for cancer
  publication-title: Genome Biol
  doi: 10.1186/s13059-022-02739-2
– volume: 24
  start-page: 169
  year: 2023
  ident: 2025061901530233500_vbae199-B9
  article-title: moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-023-05273-5
– volume: 9
  start-page: e87357
  year: 2014
  ident: 2025061901530233500_vbae199-B50
  article-title: Mutual information between discrete and continuous data sets
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0087357
– volume: 11
  start-page: 603808
  year: 2020
  ident: 2025061901530233500_vbae199-B38
  article-title: Machine learning based computational gene selection models: a survey, performance evaluation, open issues, and future research directions
  publication-title: Front Genet
  doi: 10.3389/fgene.2020.603808
– volume-title: Pattern Recognition and Machine Learning
  year: 2006
  ident: 2025061901530233500_vbae199-B4
– volume: 29
  start-page: 1944
  year: 2007
  ident: 2025061901530233500_vbae199-B13
  article-title: Weighted graph cuts without eigenvectors a multilevel approach
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2007.1115
– year: 2018
  ident: 2025061901530233500_vbae199-B47
– year: 2017
  ident: 2025061901530233500_vbae199-B27
– volume: 10
  start-page: 173
  year: 2023
  ident: 2025061901530233500_vbae199-B3
  article-title: Machine learning methods for cancer classification using gene expression data: a review
  publication-title: Bioengineering
  doi: 10.3390/bioengineering10020173
– volume: 45
  start-page: 580
  year: 2013
  ident: 2025061901530233500_vbae199-B35
  article-title: The genotype-tissue expression (GTEx) project
  publication-title: Nat Genet
  doi: 10.1038/ng.2653
– year: 2015
  ident: 2025061901530233500_vbae199-B26
– year: 2016
  ident: 2025061901530233500_vbae199-B8
– volume: 16
  start-page: 1
  year: 2022
  ident: 2025061901530233500_vbae199-B51
  article-title: Interpretable machine learning: fundamental principles and 10 grand challenges
  publication-title: Statistic Surveys
– volume: 18
  start-page: 508
  year: 2017
  ident: 2025061901530233500_vbae199-B32
  article-title: A comprehensive genomic pan-cancer classification using the cancer genome atlas gene expression data
  publication-title: BMC Genomics
  doi: 10.1186/s12864-017-3906-0
– volume: 7
  start-page: e1002240
  year: 2011
  ident: 2025061901530233500_vbae199-B56
  article-title: Most random gene expression signatures are significantly associated with breast cancer outcome
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1002240
– volume: 365
  start-page: 488
  year: 2005
  ident: 2025061901530233500_vbae199-B39
  article-title: Prediction of cancer outcome with microarrays: a multiple random validation strategy
  publication-title: Lancet
  doi: 10.1016/S0140-6736(05)17866-0
– volume: 2013
  start-page: 3
  year: 2013
  ident: 2025061901530233500_vbae199-B45
  article-title: SIDUS—the solution for extreme deduplication of an operating system
  publication-title: Linux J
– volume: 286
  start-page: 531
  year: 1999
  ident: 2025061901530233500_vbae199-B16
  article-title: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
  publication-title: Science
  doi: 10.1126/science.286.5439.531
– volume: 26
  start-page: 139
  year: 2010
  ident: 2025061901530233500_vbae199-B48
  article-title: edgeR: a bioconductor package for differential expression analysis of digital gene expression data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp616
– volume: 15
  start-page: 1929
  year: 2014
  ident: 2025061901530233500_vbae199-B52
  article-title: Dropout: a simple way to prevent neural networks from overfitting
  publication-title: J Mach Learn Res
– volume-title: Convex Optimization
  year: 2004
  ident: 2025061901530233500_vbae199-B6
  doi: 10.1017/CBO9780511804441
– volume: 8
  start-page: 203
  year: 2020
  ident: 2025061901530233500_vbae199-B46
  article-title: Classification of cancer types using graph convolutional neural networks
  publication-title: Front Phys
  doi: 10.3389/fphy.2020.00203
– year: 2017
  ident: 2025061901530233500_vbae199-B37
– volume: 35
  start-page: 15784
  year: 2022
  ident: 2025061901530233500_vbae199-B1
  article-title: Openxai: towards a transparent evaluation of model explanations
  publication-title: NeurIPS
– volume: 1
  start-page: RESEARCH0003
  year: 2000
  ident: 2025061901530233500_vbae199-B20
  article-title: Gene shaving as a method for identifying distinct sets of genes with similar expression patterns
  publication-title: Genome Biol
  doi: 10.1186/gb-2000-1-2-research0003
– year: 2019
  ident: 2025061901530233500_vbae199-B43
– volume: 27
  start-page: 1739
  year: 2011
  ident: 2025061901530233500_vbae199-B33
  article-title: Molecular signatures database (MSigDB) 3.0
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr260
– volume: 38
  start-page: 675
  year: 2020
  ident: 2025061901530233500_vbae199-B15
  article-title: Visualizing and interpreting cancer genomics data via the Xena platform
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-020-0546-8
– volume: 375
  start-page: 1109
  year: 2016
  ident: 2025061901530233500_vbae199-B18
  article-title: Toward a shared vision for cancer genomic data
  publication-title: N Engl J Med
  doi: 10.1056/NEJMp1607591
– volume: 97
  start-page: 273
  year: 1997
  ident: 2025061901530233500_vbae199-B28
  article-title: Wrappers for feature subset selection
  publication-title: Artif Intell
  doi: 10.1016/S0004-3702(97)00043-X
– volume: 18
  start-page: 39
  year: 2002
  ident: 2025061901530233500_vbae199-B41
  article-title: Tumor classification by partial least squares using microarray gene expression data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.1.39
– volume: 1
  start-page: 417
  year: 2015
  ident: 2025061901530233500_vbae199-B34
  article-title: The molecular signatures database hallmark gene set collection
  publication-title: Cell Syst
  doi: 10.1016/j.cels.2015.12.004
– volume: 22
  start-page: 455
  year: 2021
  ident: 2025061901530233500_vbae199-B5
  article-title: Deep GOnet: self-explainable deep neural network based on gene ontology for phenotype prediction from gene expression data
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-021-04370-7
SSID ssj0002776143
Score 2.2822196
Snippet Abstract Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions,...
Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of...
Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can...
SourceID hal
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage vbae199
SubjectTerms Artificial Intelligence
Bioinformatics
Computer Science
Life Sciences
Quantitative Methods
Signal and Image Processing
Title A comparative analysis of gene expression profiling by statistical and machine learning approaches
URI https://www.ncbi.nlm.nih.gov/pubmed/39897946
https://www.proquest.com/docview/3162851609
https://hal.science/hal-04731873
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2635-0041
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776143
  issn: 2635-0041
  databaseCode: DOA
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 2635-0041
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776143
  issn: 2635-0041
  databaseCode: ABDBF
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2635-0041
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776143
  issn: 2635-0041
  databaseCode: M~E
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 2635-0041
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776143
  issn: 2635-0041
  databaseCode: RPM
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 2635-0041
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776143
  issn: 2635-0041
  databaseCode: TOX
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 2635-0041
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776143
  issn: 2635-0041
  databaseCode: BENPR
  dateStart: 20220101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60IngR39ZHWEXwFJrdTTbZY5WWIlpFWugt7Ca76sFE7AP7751N0khV0GuYTGBmyLy_QeiCa_BCTAlXe5K5vg58V3g8dZm0bS2qqCzWx-76vDf0b0bBqAKLHv_SwhespV5ymc5aMyU1EXZVj0CIC5Y7uB_V1RQaQjrusxqV8ftbS15n9dnOPC7ts_0IKwv30t1Cm1VciNulIrfRis520Hp5KXK-i1QbJ1843VhWUCI4NxgsQGP9UQ20Zri8wg0eCas5tvtCBRQz8JZZil-L2UmNq2MRT3iBKa7He2jY7Qyue251HsFNWBBNXGGCIA0iHjGeEM1DZRSXqZ9yKmhAI2U0k5Aqcwu3wr1QEKlIKtNE2qAsNCHbR40sz_Qhwl5iEiPChELy4yuTCG5REhkJjCIQUUZNdLkQX_xWomDEZfeaxaWg40rQTXQO0q2JLHh1r30b22eeH8IPJGQzCkQg_D85nS10E4PJ2z6GzHQ-HceM2L1Pwj2gOSiVVvNiIhIWM__oP584RhvUnvMtKionqDF5n-pTiDEmykFrV53-w6NT5OhOUQRyCpP7BHLj1Kk
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comparative+analysis+of+gene+expression+profiling+by+statistical+and+machine+learning+approaches&rft.jtitle=Bioinformatics+advances&rft.au=Bontonou%2C+Myriam&rft.au=Haget%2C+Ana%C3%AFs&rft.au=Boulougouri%2C+Maria&rft.au=Audit%2C+Benjamin&rft.date=2025&rft.eissn=2635-0041&rft.volume=5&rft.issue=1&rft.spage=vbae199&rft_id=info:doi/10.1093%2Fbioadv%2Fvbae199&rft_id=info%3Apmid%2F39897946&rft.externalDocID=39897946
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2635-0041&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2635-0041&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2635-0041&client=summon