SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis

The Dimensionality Curse is one of the most critical issues that are hindering faster evolution in several fields broadly, and in bioinformatics distinctively. To counter this curse, a conglomerate solution is needed. Among the renowned techniques that proved efficacy, the scaling-based dimensionali...

Full description

Saved in:

Bibliographic Details
Published in	Journal of big data Vol. 8; no. 1; pp. 1 - 22
Main Authors	Tadist, Khawla, Mrabti, Fatiha, Nikolov, Nikola S., Zahi, Azeddine, Najah, Said
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 13.01.2021 Springer Nature B.V SpringerOpen
Subjects	Algorithms Big Data Bioinformatics Cancer Clustering Communications Engineering Comparative studies Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Database Management Feature selection Genomics Information Storage and Retrieval Iterative methods Mathematical Applications in Computer Science Medical prognosis Networks Optimization Particle swarm optimization Prognosis PSO algorithm Search algorithms Spark State-of-the-art reviews Feature selection Prognosis Genomics PSO algorithm Big Data Spark Clustering
Online Access	Get full text
ISSN	2196-1115 2196-1115
DOI	10.1186/s40537-021-00409-x

Cover

More Information
Summary:	The Dimensionality Curse is one of the most critical issues that are hindering faster evolution in several fields broadly, and in bioinformatics distinctively. To counter this curse, a conglomerate solution is needed. Among the renowned techniques that proved efficacy, the scaling-based dimensionality reduction techniques are the most prevalent. To insure improved performance and productivity, horizontal scaling functions are combined with Particle Swarm Optimization (PSO) based computational techniques. Optimization algorithms are an interesting substitute to traditional feature selection methods that are both efficient and relatively easier to scale. Particle Swarm Optimization (PSO) is an iterative search algorithm that has proved to achieve excellent results for feature selection problems. In this paper, a composite Spark Distributed approach to feature selection that combines an integrative feature selection algorithm using Binary Particle Swarm Optimization (BPSO) with Particle Swarm Optimization (PSO) algorithm for cancer prognosis is proposed; hence Spark Distributed Particle Swarm Optimization (SDPSO) approach. The effectiveness of the proposed approach is demonstrated using five benchmark genomic datasets as well as a comparative study with four state of the art methods. Compared with the four methods, the proposed approach yields the best in average of purity ranging from 0.78 to 0.97 and F-measure ranging from 0.75 to 0.96.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2196-1115 2196-1115
DOI:	10.1186/s40537-021-00409-x