Learning regulatory programs by threshold SVD regression

Significance With the increase in high-throughput data in genomic studies, the study of regulatory relationships between multidimensional predictors and responses is becoming a common task. Although high-dimensional data hold promise for revealing rich and complex regulations, it remains challenging...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the National Academy of Sciences - PNAS Vol. 111; no. 44; pp. 15675 - 15680
Main Authors Ma, Xin, Xiao, Luo, Wong, Wing Hung
Format Journal Article
LanguageEnglish
Published United States National Academy of Sciences 04.11.2014
National Acad Sciences
Subjects
Online AccessGet full text
ISSN0027-8424
1091-6490
1091-6490
DOI10.1073/pnas.1417808111

Cover

More Information
Summary:Significance With the increase in high-throughput data in genomic studies, the study of regulatory relationships between multidimensional predictors and responses is becoming a common task. Although high-dimensional data hold promise for revealing rich and complex regulations, it remains challenging to infer the relations between tens of thousands of responses and thousands of predictors, as the desired signal must be searched among an overwhelming number of irrelevant responses. Here we show that by formulating the regulatory programs as hidden-intermediate nodes in a linear network, a sparsity-inducing modeling and inference approach is effective in extracting the regulatory relations among very high-dimensional responses and predictors, even when the sample size is much lower. We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature.
Bibliography:http://dx.doi.org/10.1073/pnas.1417808111
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
Author contributions: X.M., L.X., and W.H.W. designed research; X.M. and L.X. performed research; X.M. and L.X. contributed new reagents/analytic tools; X.M. analyzed data; and X.M., L.X., and W.H.W. wrote the paper.
Contributed by Wing Hung Wong, September 18, 2014 (sent for review August 3, 2014; reviewed by Hongyu Zhao)
1X.M. and L.X. contributed equally to this work.
Reviewers included: H.Z., Yale University.
ISSN:0027-8424
1091-6490
1091-6490
DOI:10.1073/pnas.1417808111