Mixture Network Regularization of Generalized Linear Model With Application in Genomics Data
High dimensional genomics data in biomedical sciences is an invaluable resource for constructing statistical prediction models. With the increasing knowledge of gene networks and pathways, such information can be utilized in the statistical models to improve prediction accuracy and enhance model int...
        Saved in:
      
    
          | Published in | bioRxiv | 
|---|---|
| Main Authors | , , , | 
| Format | Paper | 
| Language | English | 
| Published | 
        Cold Spring Harbor
          Cold Spring Harbor Laboratory Press
    
        05.10.2022
     Cold Spring Harbor Laboratory  | 
| Edition | 1.2 | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2692-8205 2692-8205  | 
| DOI | 10.1101/678029 | 
Cover
| Summary: | High dimensional genomics data in biomedical sciences is an invaluable resource for constructing statistical prediction models. With the increasing knowledge of gene networks and pathways, such information can be utilized in the statistical models to improve prediction accuracy and enhance model interpretability. However, in certain scenarios the network structure may only be partially known or subject to inaccuracy. Thus, the performance of statistical models incorporating such network structure may be compromised. In this paper, we propose a weighted sparse network learning method by optimally combining a data driven network with sparsity property to prior known or partially known network to address this issue. We show that our proposed model attains the oracle property and achieves a parsimonious structure in high dimensional setting for different types of outcomes including continuous, binary and survival data. Simulations studies show that our proposed model is robust and outperforms existing methods. Case study on melanoma gene expression further demonstrates that our proposed model achieves good operating characteristics in identifying informative genes and predicting survival risk. An R package glmaag implementing our method is available on the Comprehensive R Archive Network (CRAN). Competing Interest Statement The authors have declared no competing interest. Footnotes * The case study on Platinum Therapy in Ovarian Cancer has been removed. The case study on Skin Cutaneous Melanoma Prediction has been updated to include application of the proposed model in predicting Breslow's depth. Supplementary Materials is now appended to the end of the manuscript. | 
|---|---|
| Bibliography: | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest.  | 
| ISSN: | 2692-8205 2692-8205  | 
| DOI: | 10.1101/678029 |