Improving cancer prediction using feature selection in spark environment

Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics...

Full description

Saved in:

Bibliographic Details
Published in	Concurrency and computation Vol. 36; no. 2
Main Authors	Longkumer, Imtisenla, Hussain Mazumder, Dilwar
Format	Journal Article
Language	English
Published	Hoboken Wiley Subscription Services, Inc 25.01.2024
Subjects	big data Cancer cancer prediction Classification Classifiers Decision trees Feature selection Gene expression machine learning Selectors Support vector machines
Online Access	Get full text
ISSN	1532-0626 1532-0634
DOI	10.1002/cpe.7903

Cover

More Information
Summary:	Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics of this malignant condition may often vary, providing a significant amount of data that requires additional time and resources to process. This research work proposes an Apache Spark‐based feature selection for microarray cancer classification. The first aim is to select only the optimal and necessary features obtained by the feature selector(information gain [IG] and correlation‐based feature selection [CFS]). Secondly, employ a distributed framework and observe the efficiency of the different feature selectors for classification. Finally, we evaluated our approach in terms of accuracy, precision, recall and ROC (AUC) using three classifiers: support vector machine (SVM), naive Bayes (NB), and decision tree (DT). The results reveal that the NB classifier outperformed in all the cases with IG as a feature selector.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.7903