Hybrid huberized support vector machines for microarray classification and gene selection

Motivation: The standard L2-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant g...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 24; no. 3; pp. 412 - 419
Main Authors	Wang, Li, Zhu, Ji, Zou, Hui
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 01.02.2008 Oxford Publishing Limited (England)
Subjects	Algorithms Artificial Intelligence Bioinformatics Biological and medical sciences Classification Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Oligonucleotide Array Sequence Analysis - methods Pattern Recognition, Automated - methods Gene Selection Classification DNA chip Hybrid Microarray Bioinformatics Vector
Online Access	Get full text
ISSN	1367-4803 1367-4811 1460-2059 1367-4811
DOI	10.1093/bioinformatics/btm579

Cover

More Information
Summary:	Motivation: The standard L2-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant genes for the classification. The L1-norm SVM is a variant of the standard L2-norm SVM, that constrains the L1-norm of the fitted coefficients. Due to the singularity of the L1-norm, the L1-norm SVM has the property of automatically selecting relevant genes. On the other hand, the L1-norm SVM has two drawbacks: (1) the number of selected genes is upper bounded by the size of the training data; (2) when there are several highly correlated genes, the L1-norm SVM tends to pick only a few of them, and remove the rest. Results: We propose a hybrid huberized support vector machine (HHSVM). The HHSVM combines the huberized hinge loss function and the elastic-net penalty. By doing so, the HHSVM performs automatic gene selection in a way similar to the L1-norm SVM. In addition, the HHSVM encourages highly correlated genes to be selected (or removed) together. We also develop an efficient algorithm to compute the entire solution path of the HHSVM. Numerical results indicate that the HHSVM tends to provide better variable selection results than the L1-norm SVM, especially when variables are highly correlated. Availability: R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/ Contact: jizhu@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography:	To whom correspondence should be addressed. istex:6E4B60015A6174275600E86AF7628A4D04655FAD ArticleID:btm579 ark:/67375/HXZ-SS8BV22D-Q Associate Editor: David Rocke ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1367-4811 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btm579