Clustering threshold gradient descent regularization: with applications to microarray studies

Motivation: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant inf...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 23; no. 4; pp. 466 - 472
Main Authors	Ma, Shuangge, Huang, Jian
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 15.02.2007 Oxford Publishing Limited (England)
Subjects	Algorithms Bioinformatics Biological and medical sciences Biomarkers, Tumor - metabolism Cluster Analysis Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Humans Lymphoma Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Multigene Family Neoplasm Proteins - metabolism Neoplasms - diagnosis Neoplasms - metabolism Numerical Analysis, Computer-Assisted Oligonucleotide Array Sequence Analysis - methods Statistical models Disease DNA chip K means algorithm Gene expression Microarray Clusterin Gradient descent Original document Regulation(control) Computer program Classification Bioinformatics Comparative study
Online Access	Get full text
ISSN	1367-4803 1367-4811 1460-2059 1367-4811
DOI	10.1093/bioinformatics/btl632

Cover

More Information
Summary:	Motivation: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for. Results: For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose a clustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection. We apply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions. Availability: R code is available upon request. Contact:shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography:	ark:/67375/HXZ-J3301M5S-H istex:7C9E3AE4B594B34E336297914FB6C98B21334139 Associate Editor: Satoru Miyano To whom correspondence should be addressed. ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1367-4803 1367-4811 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btl632