COMBING: Clustering in Oncology for Mathematical and Biological Identification of Novel Gene Signatures

Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a tremendous bottleneck regarding clinical adoption. In this paper,...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on computational biology and bioinformatics Vol. 19; no. 6; pp. 3317 - 3331
Main Authors Battistella, Enzo, Vakalopoulou, Maria, Sun, Roger, Estienne, Theo, Lerousseau, Marvin, Nikolaev, Sergey, Andres, Emilie Alvarez, Carre, Alexandre, Niyoteka, Stephane, Robert, Charlotte, Paragios, Nikos, Deutsch, Eric
Format Journal Article
LanguageEnglish
Published United States IEEE 01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text
ISSN1545-5963
1557-9964
2374-0043
1557-9964
DOI10.1109/TCBB.2021.3123910

Cover

More Information
Summary:Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a tremendous bottleneck regarding clinical adoption. In this paper, we introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers. Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm. It offers modularity as concerns metric functions and scalability, while being able to automatically determine the best number of clusters. Our evaluation includes both mathematical and biological criteria to define a quantitative metric. The recovered signature is applied to a variety of biological tasks, including screening of biological pathways and functions, and characterization relevance on tumor types and subtypes. Quantitative comparisons among different distance metrics, commonly used clustering methods and a referential gene signature used in the literature, confirm state of the art performance of our approach. In particular, our signature, based on 27 genes, reports at least 30 times better mathematical significance (average Dunn's Index) and <inline-formula><tex-math notation="LaTeX">25\%</tex-math> <mml:math><mml:mrow><mml:mn>25</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="battistella-ieq1-3123910.gif"/> </inline-formula> better biological significance (average Enrichment in Protein-Protein Interaction) than those produced by other referential clustering methods. Finally, our signature reports promising results on distinguishing immune inflammatory and immune desert tumors, while reporting a high balanced accuracy of <inline-formula><tex-math notation="LaTeX">92\%</tex-math> <mml:math><mml:mrow><mml:mn>92</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="battistella-ieq2-3123910.gif"/> </inline-formula> on tumor types classification and averaged balanced accuracy of <inline-formula><tex-math notation="LaTeX">68\%</tex-math> <mml:math><mml:mrow><mml:mn>68</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="battistella-ieq3-3123910.gif"/> </inline-formula> on tumor subtypes classification, which represents, respectively <inline-formula><tex-math notation="LaTeX">7\%</tex-math> <mml:math><mml:mrow><mml:mn>7</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="battistella-ieq4-3123910.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">9\%</tex-math> <mml:math><mml:mrow><mml:mn>9</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="battistella-ieq5-3123910.gif"/> </inline-formula> higher performance compared to the referential signature.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1545-5963
1557-9964
2374-0043
1557-9964
DOI:10.1109/TCBB.2021.3123910