A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as , where d k , u k , and v k minimize the squared Frobenius norm of X , subject to penalties on u k and v k . This results in a regularized version of t...

Full description

Saved in:

Bibliographic Details
Published in	Biostatistics (Oxford, England) Vol. 10; no. 3; pp. 515 - 534
Main Authors	Witten, Daniela M., Tibshirani, Robert, Hastie, Trevor
Format	Journal Article
Language	English
Published	England Oxford University Press 01.07.2009 Oxford Publishing Limited (England)
Subjects	Algorithms Biometry - methods Breast Neoplasms - genetics Chromosomes, Human, Pair 1 - genetics Correlation analysis Data Interpretation, Statistical DNA, Neoplasm - genetics Female Gene Dosage Gene expression Genomics - statistics & numerical data Humans Mathematics Matrix Models, Statistical Principal Component Analysis - methods Statistical analysis SVD Integrative genomic analysis Sparse principal component analysis DNA copy number Matrix decomposition Canonical correlation analysis Principal component analysis
Online Access	Get full text
ISSN	1465-4644 1468-4357 1468-4357
DOI	10.1093/biostatistics/kxp008

Cover

More Information
Summary:	We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as , where d k , u k , and v k minimize the squared Frobenius norm of X , subject to penalties on u k and v k . This results in a regularized version of the singular value decomposition. Of particular interest is the use of L 1-penalties on u k and v k , which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L 1-penalty on v k but not on u k , a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1465-4644 1468-4357 1468-4357
DOI:	10.1093/biostatistics/kxp008