Local-Learning-Based Feature Selection for High-Dimensional Data Analysis

This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and so...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 32; no. 9; pp. 1610 - 1626
Main Authors	Yijun Sun, Todorovic, Sinisa, Goodison, Steve
Format	Journal Article
Language	English
Published	Los Alamitos, CA IEEE 01.09.2010 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Algorithm design and analysis Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Artificial Intelligence Classification Complexity Computational complexity Computer science; control theory; systems Computer Simulation Data analysis Data processing Decision Support Techniques ell_1 regularization Exact sciences and technology Feature selection Intelligence Learning local learning logistical regression Machine learning Machine learning algorithms Mathematical models Microcomputers Models, Theoretical Numerical analysis Pattern analysis Pattern Recognition, Automated - methods sample complexity Studies Sun Support vector machine classification Support vector machines Theoretical computing Local search Feature selection Personal computer High precision regularization Nonlinear problems local learning Computational complexity sample complexity Relevance logistical regression Selection problem Multidimensional database ℓ Supervised learning Classification Algorithm complexity Data distribution Pattern analysis Artificial intelligence Viability Algorithm analysis
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2009.190

Cover

More Information
Summary:	This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 ObjectType-Article-2 ObjectType-Feature-1 For information on obtaining reprints of this article, please send e-mail to:tpami@computer.org, and reference IEEECS Log Number TPAMI-2009-07-0430.
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2009.190