Explainable Machine Learning Models for Glioma Subtype Classification and Survival Prediction

Background/Objectives: Gliomas are complex and heterogeneous brain tumors characterized by an unfavorable clinical course and a fatal prognosis, which can be improved by an early determination of tumor kind. Here, we developed explainable machine learning (ML) models for classifying three major glio...

Full description

Saved in:

Bibliographic Details
Published in	Cancers Vol. 17; no. 16; p. 2614
Main Authors	Vershinina, Olga, Turubanova, Victoria, Krivonosov, Mikhail, Trukhanov, Arseniy, Ivanchenko, Mikhail
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 09.08.2025 MDPI
Subjects	Artificial intelligence Astrocytoma Brain cancer Brain tumors Classification Datasets Diagnosis Gelatinase B Gene expression Genes Glioblastoma Glioma Gliomas Learning algorithms Machine learning Medical research Medicine, Experimental Neural networks NOX4 protein Oligodendroglioma Patients Prediction models RNA RNA sequencing Survival Taiwan explainable artificial intelligence gene expression data subtype classification glioma overall survival prognosis machine learning
Online Access	Get full text
ISSN	2072-6694 2072-6694
DOI	10.3390/cancers17162614

Cover

More Information
Summary:	Background/Objectives: Gliomas are complex and heterogeneous brain tumors characterized by an unfavorable clinical course and a fatal prognosis, which can be improved by an early determination of tumor kind. Here, we developed explainable machine learning (ML) models for classifying three major glioma subtypes (astrocytoma, oligodendroglioma, and glioblastoma) and predicting survival rates based on RNA-seq data. Methods: We analyzed publicly available datasets and applied feature selection techniques to identify key biomarkers. Using various ML models, we performed classification and survival analysis to develop robust predictive models. The best-performing models were then interpreted using Shapley additive explanations (SHAP). Results: Thirteen key genes (TERT, NOX4, MMP9, TRIM67, ZDHHC18, HDAC1, TUBB6, ADM, NOG, CHEK2, KCNJ11, KCNIP2, and VEGFA) proved to be closely associated with glioma subtypes as well as survival. Support Vector Machine (SVM) turned out to be the optimal classification model with the balanced accuracy of 0.816 and the area under the receiver operating characteristic curve (AUC) of 0.896 for the test datasets. The Case-Control Cox regression model (CoxCC) proved best for predicting survival with the Harrell’s C-index of 0.809 and 0.8 for the test datasets. Using SHAP we revealed the gene expression influence on the outputs of both models, thus enhancing the transparency of the prediction generation process. Conclusions: The results indicated that the developed models could serve as a valuable practical tool for clinicians, assisting them in diagnosing and determining optimal treatment strategies for patients with glioma.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2072-6694 2072-6694
DOI:	10.3390/cancers17162614