A new improved filter-based feature selection model for high-dimensional data

Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the class...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 76; no. 8; pp. 5745 - 5762
Main Authors Munirathinam, Deepak Raj, Ranganadhan, Mohanasundaram
Format Journal Article
LanguageEnglish
Published New York Springer US 01.08.2020
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0920-8542
1573-0484
DOI10.1007/s11227-019-02975-7

Cover

More Information
Summary:Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the classifier usually depends on the feature subset that has been used for classification. The Relief algorithm is a popular heuristic approach to select significant feature subsets. The Relief algorithm estimates feature individually and selects top-scored feature for subset generation. Many extensions of the Relief algorithm have been developed. However, an important defect in the Relief-based algorithms has been ignored for years. Because of the uncertainty and noise of the instances used for measuring the feature score in the Relief algorithm, the outcome results will vacillate with the instances, which lead to poor classification accuracy. To fix this problem, a novel feature selection algorithm based on Chebyshev distance-outlier detection model is proposed called noisy feature removal-Relief, NFR-ReliefF in short. To demonstrate the performance of NFR-ReliefF algorithm, an extensive experiment, including classification tests, has been carried out on nine benchmarking high-dimensional datasets by uniting the proposed model with standard classifiers, including the naïve Bayes, C4.5 and KNN. The results prove that NFR-ReliefF outperforms the other models on most tested datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-019-02975-7