A new improved filter-based feature selection model for high-dimensional data

Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the class...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of supercomputing Vol. 76; no. 8; pp. 5745 - 5762
Main Authors	Munirathinam, Deepak Raj, Ranganadhan, Mohanasundaram
Format	Journal Article
Language	English
Published	New York Springer US 01.08.2020 Springer Nature B.V
Subjects	Algorithms Chebyshev approximation Classification Classifiers Compilers Computer Science Data analysis Datasets Heuristic methods Interpreters Outliers (statistics) Processor Architectures Programming Languages Feature selection Noisy feature Data mining Bioinformatics Relief Classification
Online Access	Get full text
ISSN	0920-8542 1573-0484
DOI	10.1007/s11227-019-02975-7

Cover

More Information
Summary:	Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the classifier usually depends on the feature subset that has been used for classification. The Relief algorithm is a popular heuristic approach to select significant feature subsets. The Relief algorithm estimates feature individually and selects top-scored feature for subset generation. Many extensions of the Relief algorithm have been developed. However, an important defect in the Relief-based algorithms has been ignored for years. Because of the uncertainty and noise of the instances used for measuring the feature score in the Relief algorithm, the outcome results will vacillate with the instances, which lead to poor classification accuracy. To fix this problem, a novel feature selection algorithm based on Chebyshev distance-outlier detection model is proposed called noisy feature removal-Relief, NFR-ReliefF in short. To demonstrate the performance of NFR-ReliefF algorithm, an extensive experiment, including classification tests, has been carried out on nine benchmarking high-dimensional datasets by uniting the proposed model with standard classifiers, including the naïve Bayes, C4.5 and KNN. The results prove that NFR-ReliefF outperforms the other models on most tested datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-019-02975-7