Sample Weighting: An Inherent Approach for Outlier Suppressing Discriminant Analysis
As the data acquirement technologies develop rapidly, both the amount and types of data become larger and larger. However, noise and outliers usually attach to the data and then affect the real performance of leaning algorithms in data mining and pattern analysis. To address this problem, the import...
Saved in:
| Published in | IEEE transactions on knowledge and data engineering Vol. 27; no. 11; pp. 3070 - 3083 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.11.2015
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1041-4347 1558-2191 |
| DOI | 10.1109/TKDE.2015.2448547 |
Cover
| Summary: | As the data acquirement technologies develop rapidly, both the amount and types of data become larger and larger. However, noise and outliers usually attach to the data and then affect the real performance of leaning algorithms in data mining and pattern analysis. To address this problem, the importance of the sample itself in building the optimal subspace is explored, and then an importance-sampling-inspired method is proposed for outlier suppressing feature extraction. First, we assign each sample a weight, which is estimated by graph Laplacian, and then calculate the approximated mean for each subject. By highlighting the most subject-oriented samples, the weighted average and the scatter metrics can be measured with maximum margins and superior classification performance. The supervised information integrates local data structure with respective contributions to building the optimal subspace. The linear criterion can be extended to a nonlinear case by the kernel trick. A regularization framework is proposed to deal with the rank-deficient problem, which is usually induced by the small sample size of training set. Competitive performance of our algorithm has been validated by extensive experiments performed on the synthetic and benchmark data, including facial images and gene micro-array data. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1041-4347 1558-2191 |
| DOI: | 10.1109/TKDE.2015.2448547 |