Feature Selection Based on Neighborhood Discrimination Index

Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning, and data mining. Neighborhood is one of the most important concepts in classification learning and can be used to distinguish samples with different decisions. In this paper, a neighborhood disc...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 29; no. 7; pp. 2986 - 2999
Main Authors	Wang, Changzhong, Hu, Qinghua, Wang, Xizhao, Chen, Degang, Qian, Yuhua, Dong, Zhe
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithm design and analysis Algorithms Data mining Data processing Discrimination Discrimination index distinguishing information Entropy Entropy (Information theory) feature selection Greedy algorithms Indexes Learning algorithms Machine learning Manganese Mutual information neighborhood relation Neighborhoods Pattern recognition Preprocessing Uncertainty
Online Access	Get full text
ISSN	2162-237X 2162-2388 2162-2388
DOI	10.1109/TNNLS.2017.2710422

Cover

More Information
Summary:	Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning, and data mining. Neighborhood is one of the most important concepts in classification learning and can be used to distinguish samples with different decisions. In this paper, a neighborhood discrimination index is proposed to characterize the distinguishing information of a neighborhood relation. It reflects the distinguishing ability of a feature subset. The proposed discrimination index is computed by considering the cardinality of a neighborhood relation rather than neighborhood similarity classes. Variants of the discrimination index, including joint discrimination index, conditional discrimination index, and mutual discrimination index, are introduced to compute the change of distinguishing information caused by the combination of multiple feature subsets. They have the similar properties as Shannon entropy and its variants. A parameter, named neighborhood radius, is introduced in these discrimination measures to address the analysis of real-valued data. Based on the proposed discrimination measures, the significance measure of a candidate feature is defined and a greedy forward algorithm for feature selection is designed. Data sets selected from public data sources are used to compare the proposed algorithm with existing algorithms. The experimental results confirm that the discrimination index-based algorithm yields superior performance compared to other classical algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2017.2710422