A comparative analysis of gene expression profiling by statistical and machine learning approaches
Abstract Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by th...
Saved in:
Published in | Bioinformatics advances Vol. 5; no. 1; p. vbae199 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
2025
Oxford academic |
Subjects | |
Online Access | Get full text |
ISSN | 2635-0041 2635-0041 |
DOI | 10.1093/bioadv/vbae199 |
Cover
Abstract | Abstract
Motivation
Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.
Results
Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.
Availability and implementation
Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics. |
---|---|
AbstractList | Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.
Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.
Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics. Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.MotivationMany machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.ResultsExperiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.Availability and implementationPython code and datasets are available at https://github.com/mbonto/XAI_in_genomics. Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for their decisions. These explanations often take the form of a list of genes ranked in order of importance for the predictions, the highest-ranked genes being interpreted as linked to the phenotype. We discuss the biological and the methodological limitations of such explanations. Experiments are performed on several datasets gathering cancer and healthy tissue samples from the TCGA, GTEx and TARGET databases. A collection of machine learning models including logistic regression, multilayer perceptron, and graph neural network are trained to classify samples according to their cancer type. Gene rankings are obtained from explainability methods adapted to these models, and compared to the ones from classical statistical feature selection methods such as mutual information, DESeq2, and EdgeR. Interestingly, on simple tasks, we observe that the information learned by black-box neural networks is related to the notion of differential expression. In all cases, a small set containing the best-ranked genes is sufficient to achieve a good classification. However, these genes differ significantly between the methods and similar classification performance can be achieved with numerous lower ranked genes. In conclusion, although these methods enable the identification of biomarkers characteristic of certain pathologies, our results question the completeness of the selected gene sets and thus of explainability by the identification of the underlying biological processes. Abstract Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example. Results Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain. Availability and implementation Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics. |
Author | Haget, Anaïs Bontonou, Myriam Arbona, Jean-Michel Boulougouri, Maria Borgnat, Pierre Audit, Benjamin |
Author_xml | – sequence: 1 givenname: Myriam orcidid: 0000-0002-0010-5457 surname: Bontonou fullname: Bontonou, Myriam – sequence: 2 givenname: Anaïs surname: Haget fullname: Haget, Anaïs – sequence: 3 givenname: Maria surname: Boulougouri fullname: Boulougouri, Maria – sequence: 4 givenname: Benjamin orcidid: 0000-0003-2683-9990 surname: Audit fullname: Audit, Benjamin email: benjamin.audit@ens-lyon.fr – sequence: 5 givenname: Pierre orcidid: 0000-0003-4536-8354 surname: Borgnat fullname: Borgnat, Pierre email: pierre.borgnat@ens-lyon.fr – sequence: 6 givenname: Jean-Michel orcidid: 0000-0001-6166-9056 surname: Arbona fullname: Arbona, Jean-Michel email: jean-michel.arbona@ens-lyon.fr |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39897946$$D View this record in MEDLINE/PubMed https://hal.science/hal-04731873$$DView record in HAL |
BookMark | eNqF0T1PwzAQBmALFUEprIzIIwwFO07seKwQX1IlFpitc3JpjZI4xGlF_z2uWgobk63Tc6fTvWdk1PoWCbnk7JYzLe6s81Cu79YWkGt9RMaJFNmUsZSP_vxPyUUIH4yxRCnJU3FCToXOtdKpHBM7o4VvOuhhcGuk0EK9CS5QX9EFtkjxq-sxBOdb2vW-crVrF9RuaBhiQxhcAXVsKmkDxdJFXyP07dZAF30sYjgnxxXUAS_274S8Pz683T9P569PL_ez-bQQWT5MdZVlZZbLXMiCo1S2shLKtJSJTrIktxUKQCskl1pIpjQHy0soC5AZy1WlxITc7OYuoTZd7xroN8aDM8-zudnWWKoEz5VYJ9Fe72xc8nOFYTCNCwXWNbToV8EILpM84zJeeUKu9nRlGywPk39uGMHtDhS9D6HH6kA4M9uczC4ns8_pd1G_6v6z32W8lg0 |
Cites_doi | 10.1371/journal.pcbi.1002375 10.1093/jnci/dju049 10.1186/s13059-014-0550-8 10.1186/gb-2002-3-12-research0069 10.1093/bib/bbae027 10.1101/gr.268581.120 10.1016/j.patter.2021.100213 10.1016/j.cell.2018.03.022 10.1214/13-EJS815 10.1006/mgme.2001.3193 10.1038/nbt.3772 10.1109/TPAMI.2004.55 10.1016/j.aiopen.2021.01.001 10.1186/s12864-023-09571-3 10.1371/journal.pone.0190152 10.1016/j.cels.2019.06.006 10.1200/JCO.2008.18.1370 10.1186/s12859-022-04807-7 10.1186/s13059-022-02739-2 10.1186/s12859-023-05273-5 10.1371/journal.pone.0087357 10.3389/fgene.2020.603808 10.1109/TPAMI.2007.1115 10.3390/bioengineering10020173 10.1038/ng.2653 10.1186/s12864-017-3906-0 10.1371/journal.pcbi.1002240 10.1016/S0140-6736(05)17866-0 10.1126/science.286.5439.531 10.1093/bioinformatics/btp616 10.1017/CBO9780511804441 10.3389/fphy.2020.00203 10.1186/gb-2000-1-2-research0003 10.1093/bioinformatics/btr260 10.1038/s41587-020-0546-8 10.1056/NEJMp1607591 10.1016/S0004-3702(97)00043-X 10.1093/bioinformatics/18.1.39 10.1016/j.cels.2015.12.004 10.1186/s12859-021-04370-7 |
ContentType | Journal Article |
Copyright | The Author(s) 2024. Published by Oxford University Press. 2024 The Author(s) 2024. Published by Oxford University Press. Attribution |
Copyright_xml | – notice: The Author(s) 2024. Published by Oxford University Press. 2024 – notice: The Author(s) 2024. Published by Oxford University Press. – notice: Attribution |
DBID | TOX AAYXX CITATION NPM 7X8 1XC VOOES |
DOI | 10.1093/bioadv/vbae199 |
DatabaseName | Oxford Journals Open Access Collection CrossRef PubMed MEDLINE - Academic Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | PubMed MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology Computer Science |
EISSN | 2635-0041 |
ExternalDocumentID | oai_HAL_hal_04731873v2 39897946 10_1093_bioadv_vbae199 10.1093/bioadv/vbae199 |
Genre | Journal Article |
GroupedDBID | 0R~ ABDBF ABEJV ABGNP ABXVV AFKRA ALMA_UNASSIGNED_HOLDINGS AMNDL BBNVY BENPR BHPHI CCPQU GROUPED_DOAJ HCIFZ M7P M~E OK1 PHGZM PHGZT PIMPY PQGLB RPM TOX ZCN AAYXX CITATION NPM 7X8 1XC VOOES |
ID | FETCH-LOGICAL-c358t-9f55d586836c1e67bfb6ad4d6292528bfe3aeb36169360791ab1dadca65087f73 |
IEDL.DBID | TOX |
ISSN | 2635-0041 |
IngestDate | Thu Sep 25 06:52:42 EDT 2025 Fri Sep 05 13:52:39 EDT 2025 Mon Jul 21 05:47:46 EDT 2025 Tue Aug 05 12:10:22 EDT 2025 Mon Sep 22 07:30:33 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Genomics (q-bio.GN) FOS: Biological sciences FOS: Computer and information sciences Machine Learning (cs.LG) |
Language | English |
License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2024. Published by Oxford University Press. Attribution: http://creativecommons.org/licenses/by |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c358t-9f55d586836c1e67bfb6ad4d6292528bfe3aeb36169360791ab1dadca65087f73 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0003-2683-9990 0000-0003-4536-8354 0000-0002-0010-5457 0000-0001-6166-9056 |
OpenAccessLink | https://dx.doi.org/10.1093/bioadv/vbae199 |
PMID | 39897946 |
PQID | 3162851609 |
PQPubID | 23479 |
ParticipantIDs | hal_primary_oai_HAL_hal_04731873v2 proquest_miscellaneous_3162851609 pubmed_primary_39897946 crossref_primary_10_1093_bioadv_vbae199 oup_primary_10_1093_bioadv_vbae199 |
PublicationCentury | 2000 |
PublicationDate | 2025-00-00 |
PublicationDateYYYYMMDD | 2025-01-01 |
PublicationDate_xml | – year: 2025 text: 2025-00-00 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Bioinformatics advances |
PublicationTitleAlternate | Bioinform Adv |
PublicationYear | 2025 |
Publisher | Oxford University Press Oxford academic |
Publisher_xml | – name: Oxford University Press – name: Oxford academic |
References | Ahn (2025061901530233500_vbae199-B2) 2018 Michiels (2025061901530233500_vbae199-B39) 2005; 365 Quemener (2025061901530233500_vbae199-B45) 2013; 2013 Grossman (2025061901530233500_vbae199-B18) 2016; 375 Li (2025061901530233500_vbae199-B32) 2017; 18 Srivastava (2025061901530233500_vbae199-B52) 2014; 15 Kimmel (2025061901530233500_vbae199-B25) 2021; 31 Liberzon (2025061901530233500_vbae199-B34) 2015; 1 Nguyen (2025061901530233500_vbae199-B41) 2002; 18 Kipf (2025061901530233500_vbae199-B27) 2017 Liberzon (2025061901530233500_vbae199-B33) 2011; 27 Waldron (2025061901530233500_vbae199-B58) 2014; 106 Zhou (2025061901530233500_vbae199-B61) 2020; 1 Agarwal (2025061901530233500_vbae199-B1) 2022; 35 Costa-Silva (2025061901530233500_vbae199-B11) 2017; 12 Xiong (2025061901530233500_vbae199-B59) 2001; 73 Paszke (2025061901530233500_vbae199-B43) 2019 Brouard (2025061901530233500_vbae199-B7) 2024; 25 Jolliffe (2025061901530233500_vbae199-B23) 2002 Robinson (2025061901530233500_vbae199-B48) 2010; 26 Rohimat (2025061901530233500_vbae199-B49) 2022 Ross (2025061901530233500_vbae199-B50) 2014; 9 Bourgeais (2025061901530233500_vbae199-B5) 2021; 22 Molnar (2025061901530233500_vbae199-B40) 2022 Hanczar (2025061901530233500_vbae199-B19) 2022; 23 Kingma (2025061901530233500_vbae199-B26) 2015 Venet (2025061901530233500_vbae199-B56) 2011; 7 Leng (2025061901530233500_vbae199-B31) 2022; 23 Parker (2025061901530233500_vbae199-B42) 2009; 27 Hastie (2025061901530233500_vbae199-B20) 2000; 1 Kokhlikyan (2025061901530233500_vbae199-B29) 2020 Alharbi (2025061901530233500_vbae199-B3) 2023; 10 Gao (2025061901530233500_vbae199-B14) 2019; 9 Kohavi (2025061901530233500_vbae199-B28) 1997; 97 Mahendran (2025061901530233500_vbae199-B38) 2020; 11 Yu (2025061901530233500_vbae199-B60) 2004; 5 Boyd (2025061901530233500_vbae199-B6) 2004 Chen (2025061901530233500_vbae199-B8) 2016 Ramirez (2025061901530233500_vbae199-B46) 2020; 8 Love (2025061901530233500_vbae199-B36) 2014; 15 Rhee (2025061901530233500_vbae199-B47) 2018 Pedregosa (2025061901530233500_vbae199-B44) 2011; 12 Sundararajan (2025061901530233500_vbae199-B53) 2017 Dettling (2025061901530233500_vbae199-B12) 2002; 3 Rudin (2025061901530233500_vbae199-B51) 2022; 16 Goldman (2025061901530233500_vbae199-B15) 2020; 38 Jacquet (2025061901530233500_vbae199-B22) 2023; 24 Choi (2025061901530233500_vbae199-B9) 2023; 24 Goodfellow (2025061901530233500_vbae199-B17) 2016 Vivian (2025061901530233500_vbae199-B57) 2017; 35 Hoadley (2025061901530233500_vbae199-B21) 2018; 173 Krishnapuram (2025061901530233500_vbae199-B30) 2004; 26 Clarke (2025061901530233500_vbae199-B10) 2021; 2 Golub (2025061901530233500_vbae199-B16) 1999; 286 Lonsdale (2025061901530233500_vbae199-B35) 2013; 45 Khatri (2025061901530233500_vbae199-B24) 2012; 8 Lundberg (2025061901530233500_vbae199-B37) 2017 Tibshirani (2025061901530233500_vbae199-B54) 2013; 7 Tomczak (2025061901530233500_vbae199-B55) 2015; 19 Dhillon (2025061901530233500_vbae199-B13) 2007; 29 Bishop (2025061901530233500_vbae199-B4) 2006 |
References_xml | – year: 2022 ident: 2025061901530233500_vbae199-B49 – volume: 8 start-page: e1002375 year: 2012 ident: 2025061901530233500_vbae199-B24 article-title: Ten years of pathway analysis: current approaches and outstanding challenges publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1002375 – volume: 12 start-page: 2825 year: 2011 ident: 2025061901530233500_vbae199-B44 article-title: Scikit-learn: machine learning in python publication-title: JMLR – year: 2017 ident: 2025061901530233500_vbae199-B53 – year: 2020 ident: 2025061901530233500_vbae199-B29 – volume: 106 start-page: dju049 year: 2014 ident: 2025061901530233500_vbae199-B58 article-title: Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer publication-title: J Natl Cancer Inst doi: 10.1093/jnci/dju049 – volume: 15 start-page: 550 year: 2014 ident: 2025061901530233500_vbae199-B36 article-title: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 publication-title: Genome Biol doi: 10.1186/s13059-014-0550-8 – volume: 3 start-page: RESEARCH0069 year: 2002 ident: 2025061901530233500_vbae199-B12 article-title: Supervised clustering of genes publication-title: Genome Biol doi: 10.1186/gb-2002-3-12-research0069 – volume: 25 start-page: bbae027 year: 2024 ident: 2025061901530233500_vbae199-B7 article-title: Should we really use graph neural networks for transcriptomic prediction? publication-title: Brief Bioinform doi: 10.1093/bib/bbae027 – volume: 19 start-page: A68 year: 2015 ident: 2025061901530233500_vbae199-B55 article-title: Review the cancer genome atlas (TCGA): an immeasurable source of knowledge publication-title: Contemp Oncol (Pozn) – volume: 5 start-page: 1205 year: 2004 ident: 2025061901530233500_vbae199-B60 article-title: H. Efficient feature selection via analysis of relevance and redundancy publication-title: J Mach Learn Res – volume: 31 start-page: 1781 year: 2021 ident: 2025061901530233500_vbae199-B25 article-title: Semisupervised adversarial neural networks for single-cell classification publication-title: Genome Res doi: 10.1101/gr.268581.120 – volume: 2 start-page: 100213 year: 2021 ident: 2025061901530233500_vbae199-B10 article-title: Appyters: turning jupyter notebooks into data-driven web apps publication-title: Patterns (N Y) doi: 10.1016/j.patter.2021.100213 – year: 2022 ident: 2025061901530233500_vbae199-B40 – start-page: 1748 year: 2018 ident: 2025061901530233500_vbae199-B2 – volume: 173 start-page: 291 year: 2018 ident: 2025061901530233500_vbae199-B21 article-title: Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer publication-title: Cell doi: 10.1016/j.cell.2018.03.022 – volume-title: Deep Learning year: 2016 ident: 2025061901530233500_vbae199-B17 – volume: 7 start-page: 1456 year: 2013 ident: 2025061901530233500_vbae199-B54 article-title: The lasso problem and uniqueness publication-title: Electron J Statist doi: 10.1214/13-EJS815 – volume: 73 start-page: 239 year: 2001 ident: 2025061901530233500_vbae199-B59 article-title: Feature (gene) selection in gene expression-based tumor classification publication-title: Mol Genet Metab doi: 10.1006/mgme.2001.3193 – volume: 35 start-page: 314 year: 2017 ident: 2025061901530233500_vbae199-B57 article-title: Toil enables reproducible, open source, big biomedical data analyses publication-title: Nat Biotechnol doi: 10.1038/nbt.3772 – volume: 26 start-page: 1105 year: 2004 ident: 2025061901530233500_vbae199-B30 article-title: A Bayesian approach to joint feature selection and classifier design publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2004.55 – volume: 1 start-page: 57 year: 2020 ident: 2025061901530233500_vbae199-B61 article-title: Graph neural networks: a review of methods and applications publication-title: AI Open doi: 10.1016/j.aiopen.2021.01.001 – volume: 24 start-page: 463 year: 2023 ident: 2025061901530233500_vbae199-B22 article-title: Aberrant activation of five embryonic stem cell-specific genes robustly predicts a high risk of relapse in breast cancers publication-title: BMC Genomics doi: 10.1186/s12864-023-09571-3 – volume: 12 start-page: e0190152 year: 2017 ident: 2025061901530233500_vbae199-B11 article-title: Rna-seq differential expression analysis: an extended review and a software tool publication-title: PLoS One doi: 10.1371/journal.pone.0190152 – volume: 9 start-page: 24 year: 2019 ident: 2025061901530233500_vbae199-B14 article-title: Before and after: comparison of legacy and harmonized TCGA genomic data commons’ data publication-title: Cell Syst doi: 10.1016/j.cels.2019.06.006 – volume: 27 start-page: 1160 year: 2009 ident: 2025061901530233500_vbae199-B42 article-title: Supervised risk predictor of breast cancer based on intrinsic subtypes publication-title: J Clin Oncol doi: 10.1200/JCO.2008.18.1370 – volume-title: Principal Component Analysis year: 2002 ident: 2025061901530233500_vbae199-B23 – volume: 23 start-page: 262 year: 2022 ident: 2025061901530233500_vbae199-B19 article-title: Assessment of deep learning and transfer learning for cancer prediction based on gene expression data publication-title: BMC Bioinformatics doi: 10.1186/s12859-022-04807-7 – volume: 23 start-page: 171 year: 2022 ident: 2025061901530233500_vbae199-B31 article-title: A benchmark study of deep learning-based multi-omics data fusion methods for cancer publication-title: Genome Biol doi: 10.1186/s13059-022-02739-2 – volume: 24 start-page: 169 year: 2023 ident: 2025061901530233500_vbae199-B9 article-title: moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks publication-title: BMC Bioinformatics doi: 10.1186/s12859-023-05273-5 – volume: 9 start-page: e87357 year: 2014 ident: 2025061901530233500_vbae199-B50 article-title: Mutual information between discrete and continuous data sets publication-title: PLoS One doi: 10.1371/journal.pone.0087357 – volume: 11 start-page: 603808 year: 2020 ident: 2025061901530233500_vbae199-B38 article-title: Machine learning based computational gene selection models: a survey, performance evaluation, open issues, and future research directions publication-title: Front Genet doi: 10.3389/fgene.2020.603808 – volume-title: Pattern Recognition and Machine Learning year: 2006 ident: 2025061901530233500_vbae199-B4 – volume: 29 start-page: 1944 year: 2007 ident: 2025061901530233500_vbae199-B13 article-title: Weighted graph cuts without eigenvectors a multilevel approach publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2007.1115 – year: 2018 ident: 2025061901530233500_vbae199-B47 – year: 2017 ident: 2025061901530233500_vbae199-B27 – volume: 10 start-page: 173 year: 2023 ident: 2025061901530233500_vbae199-B3 article-title: Machine learning methods for cancer classification using gene expression data: a review publication-title: Bioengineering doi: 10.3390/bioengineering10020173 – volume: 45 start-page: 580 year: 2013 ident: 2025061901530233500_vbae199-B35 article-title: The genotype-tissue expression (GTEx) project publication-title: Nat Genet doi: 10.1038/ng.2653 – year: 2015 ident: 2025061901530233500_vbae199-B26 – year: 2016 ident: 2025061901530233500_vbae199-B8 – volume: 16 start-page: 1 year: 2022 ident: 2025061901530233500_vbae199-B51 article-title: Interpretable machine learning: fundamental principles and 10 grand challenges publication-title: Statistic Surveys – volume: 18 start-page: 508 year: 2017 ident: 2025061901530233500_vbae199-B32 article-title: A comprehensive genomic pan-cancer classification using the cancer genome atlas gene expression data publication-title: BMC Genomics doi: 10.1186/s12864-017-3906-0 – volume: 7 start-page: e1002240 year: 2011 ident: 2025061901530233500_vbae199-B56 article-title: Most random gene expression signatures are significantly associated with breast cancer outcome publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1002240 – volume: 365 start-page: 488 year: 2005 ident: 2025061901530233500_vbae199-B39 article-title: Prediction of cancer outcome with microarrays: a multiple random validation strategy publication-title: Lancet doi: 10.1016/S0140-6736(05)17866-0 – volume: 2013 start-page: 3 year: 2013 ident: 2025061901530233500_vbae199-B45 article-title: SIDUS—the solution for extreme deduplication of an operating system publication-title: Linux J – volume: 286 start-page: 531 year: 1999 ident: 2025061901530233500_vbae199-B16 article-title: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring publication-title: Science doi: 10.1126/science.286.5439.531 – volume: 26 start-page: 139 year: 2010 ident: 2025061901530233500_vbae199-B48 article-title: edgeR: a bioconductor package for differential expression analysis of digital gene expression data publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp616 – volume: 15 start-page: 1929 year: 2014 ident: 2025061901530233500_vbae199-B52 article-title: Dropout: a simple way to prevent neural networks from overfitting publication-title: J Mach Learn Res – volume-title: Convex Optimization year: 2004 ident: 2025061901530233500_vbae199-B6 doi: 10.1017/CBO9780511804441 – volume: 8 start-page: 203 year: 2020 ident: 2025061901530233500_vbae199-B46 article-title: Classification of cancer types using graph convolutional neural networks publication-title: Front Phys doi: 10.3389/fphy.2020.00203 – year: 2017 ident: 2025061901530233500_vbae199-B37 – volume: 35 start-page: 15784 year: 2022 ident: 2025061901530233500_vbae199-B1 article-title: Openxai: towards a transparent evaluation of model explanations publication-title: NeurIPS – volume: 1 start-page: RESEARCH0003 year: 2000 ident: 2025061901530233500_vbae199-B20 article-title: Gene shaving as a method for identifying distinct sets of genes with similar expression patterns publication-title: Genome Biol doi: 10.1186/gb-2000-1-2-research0003 – year: 2019 ident: 2025061901530233500_vbae199-B43 – volume: 27 start-page: 1739 year: 2011 ident: 2025061901530233500_vbae199-B33 article-title: Molecular signatures database (MSigDB) 3.0 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr260 – volume: 38 start-page: 675 year: 2020 ident: 2025061901530233500_vbae199-B15 article-title: Visualizing and interpreting cancer genomics data via the Xena platform publication-title: Nat Biotechnol doi: 10.1038/s41587-020-0546-8 – volume: 375 start-page: 1109 year: 2016 ident: 2025061901530233500_vbae199-B18 article-title: Toward a shared vision for cancer genomic data publication-title: N Engl J Med doi: 10.1056/NEJMp1607591 – volume: 97 start-page: 273 year: 1997 ident: 2025061901530233500_vbae199-B28 article-title: Wrappers for feature subset selection publication-title: Artif Intell doi: 10.1016/S0004-3702(97)00043-X – volume: 18 start-page: 39 year: 2002 ident: 2025061901530233500_vbae199-B41 article-title: Tumor classification by partial least squares using microarray gene expression data publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.1.39 – volume: 1 start-page: 417 year: 2015 ident: 2025061901530233500_vbae199-B34 article-title: The molecular signatures database hallmark gene set collection publication-title: Cell Syst doi: 10.1016/j.cels.2015.12.004 – volume: 22 start-page: 455 year: 2021 ident: 2025061901530233500_vbae199-B5 article-title: Deep GOnet: self-explainable deep neural network based on gene ontology for phenotype prediction from gene expression data publication-title: BMC Bioinformatics doi: 10.1186/s12859-021-04370-7 |
SSID | ssj0002776143 |
Score | 2.2822196 |
Snippet | Abstract
Motivation
Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions,... Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of... Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can... |
SourceID | hal proquest pubmed crossref oup |
SourceType | Open Access Repository Aggregation Database Index Database Publisher |
StartPage | vbae199 |
SubjectTerms | Artificial Intelligence Bioinformatics Computer Science Life Sciences Quantitative Methods Signal and Image Processing |
Title | A comparative analysis of gene expression profiling by statistical and machine learning approaches |
URI | https://www.ncbi.nlm.nih.gov/pubmed/39897946 https://www.proquest.com/docview/3162851609 https://hal.science/hal-04731873 |
Volume | 5 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2635-0041 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776143 issn: 2635-0041 databaseCode: DOA dateStart: 20210101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 2635-0041 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776143 issn: 2635-0041 databaseCode: ABDBF dateStart: 20210101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2635-0041 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776143 issn: 2635-0041 databaseCode: M~E dateStart: 20210101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 2635-0041 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776143 issn: 2635-0041 databaseCode: RPM dateStart: 20210101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 2635-0041 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776143 issn: 2635-0041 databaseCode: TOX dateStart: 20210101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2635-0041 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776143 issn: 2635-0041 databaseCode: BENPR dateStart: 20220101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60IngR39ZHWEXwFJrdTTbZY5WWIlpFWugt7Ca76sFE7AP7751N0khV0GuYTGBmyLy_QeiCa_BCTAlXe5K5vg58V3g8dZm0bS2qqCzWx-76vDf0b0bBqAKLHv_SwhespV5ymc5aMyU1EXZVj0CIC5Y7uB_V1RQaQjrusxqV8ftbS15n9dnOPC7ts_0IKwv30t1Cm1VciNulIrfRis520Hp5KXK-i1QbJ1843VhWUCI4NxgsQGP9UQ20Zri8wg0eCas5tvtCBRQz8JZZil-L2UmNq2MRT3iBKa7He2jY7Qyue251HsFNWBBNXGGCIA0iHjGeEM1DZRSXqZ9yKmhAI2U0k5Aqcwu3wr1QEKlIKtNE2qAsNCHbR40sz_Qhwl5iEiPChELy4yuTCG5REhkJjCIQUUZNdLkQX_xWomDEZfeaxaWg40rQTXQO0q2JLHh1r30b22eeH8IPJGQzCkQg_D85nS10E4PJ2z6GzHQ-HceM2L1Pwj2gOSiVVvNiIhIWM__oP584RhvUnvMtKionqDF5n-pTiDEmykFrV53-w6NT5OhOUQRyCpP7BHLj1Kk |
linkProvider | Oxford University Press |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comparative+analysis+of+gene+expression+profiling+by+statistical+and+machine+learning+approaches&rft.jtitle=Bioinformatics+advances&rft.au=Bontonou%2C+Myriam&rft.au=Haget%2C+Ana%C3%AFs&rft.au=Boulougouri%2C+Maria&rft.au=Audit%2C+Benjamin&rft.date=2025&rft.eissn=2635-0041&rft.volume=5&rft.issue=1&rft.spage=vbae199&rft_id=info:doi/10.1093%2Fbioadv%2Fvbae199&rft_id=info%3Apmid%2F39897946&rft.externalDocID=39897946 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2635-0041&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2635-0041&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2635-0041&client=summon |