Explainable Machine Learning Models for Glioma Subtype Classification and Survival Prediction

Background/Objectives: Gliomas are complex and heterogeneous brain tumors characterized by an unfavorable clinical course and a fatal prognosis, which can be improved by an early determination of tumor kind. Here, we developed explainable machine learning (ML) models for classifying three major glio...

Full description

Saved in:
Bibliographic Details
Published inCancers Vol. 17; no. 16; p. 2614
Main Authors Vershinina, Olga, Turubanova, Victoria, Krivonosov, Mikhail, Trukhanov, Arseniy, Ivanchenko, Mikhail
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 09.08.2025
MDPI
Subjects
Online AccessGet full text
ISSN2072-6694
2072-6694
DOI10.3390/cancers17162614

Cover

More Information
Summary:Background/Objectives: Gliomas are complex and heterogeneous brain tumors characterized by an unfavorable clinical course and a fatal prognosis, which can be improved by an early determination of tumor kind. Here, we developed explainable machine learning (ML) models for classifying three major glioma subtypes (astrocytoma, oligodendroglioma, and glioblastoma) and predicting survival rates based on RNA-seq data. Methods: We analyzed publicly available datasets and applied feature selection techniques to identify key biomarkers. Using various ML models, we performed classification and survival analysis to develop robust predictive models. The best-performing models were then interpreted using Shapley additive explanations (SHAP). Results: Thirteen key genes (TERT, NOX4, MMP9, TRIM67, ZDHHC18, HDAC1, TUBB6, ADM, NOG, CHEK2, KCNJ11, KCNIP2, and VEGFA) proved to be closely associated with glioma subtypes as well as survival. Support Vector Machine (SVM) turned out to be the optimal classification model with the balanced accuracy of 0.816 and the area under the receiver operating characteristic curve (AUC) of 0.896 for the test datasets. The Case-Control Cox regression model (CoxCC) proved best for predicting survival with the Harrell’s C-index of 0.809 and 0.8 for the test datasets. Using SHAP we revealed the gene expression influence on the outputs of both models, thus enhancing the transparency of the prediction generation process. Conclusions: The results indicated that the developed models could serve as a valuable practical tool for clinicians, assisting them in diagnosing and determining optimal treatment strategies for patients with glioma.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2072-6694
2072-6694
DOI:10.3390/cancers17162614