Molecular pathway identification using biological network-regularized logistic models

Background Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L 1 -norm based regularization and its extensions elastic net and fused lasso, h...

Full description

Saved in:

Bibliographic Details
Published in	BMC genomics Vol. 14; no. Suppl 8; p. S7
Main Authors	Zhang, Wen, Wan, Ying-wooi, Allen, Genevera I, Pang, Kaifang, Anderson, Matthew L, Liu, Zhandong
Format	Journal Article
Language	English
Published	London BioMed Central 09.12.2013 Springer Nature B.V
Subjects	Algorithms Animal Genetics and Genomics Biomarkers, Tumor - metabolism Biomedical and Life Sciences Breast cancer Breast Neoplasms - metabolism Computational Biology - methods Computer Simulation Female Gene Regulatory Networks Genomics Humans Life Sciences Logistic Models Microarrays Microbial Genetics and Genomics Models, Biological Plant Genetics and Genomics Proteomics Reproducibility of Results Studies Breast Cancer Subtype Triple Negative Breast Cancer Normalize Read Count Breast Cancer Specimen Lasso
Online Access	Get full text
ISSN	1471-2164 1471-2164
DOI	10.1186/1471-2164-14-S8-S7

Cover

More Information
Summary:	Background Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L 1 -norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature. Results We propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet . Conclusion Logistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	1471-2164 1471-2164
DOI:	10.1186/1471-2164-14-S8-S7