A hybrid approach for predicting transcription factors
Transcription factors (TFs) are essential DNA-binding proteins that regulate the rate of transcription of several genes and controls the expression of genes inside a cell. The prediction of TFs with high precision is important for understanding number of biological processes such as cell-differentia...
Saved in:
| Published in | bioRxiv |
|---|---|
| Main Authors | , , , , , |
| Format | Paper |
| Language | English |
| Published |
Cold Spring Harbor Laboratory
14.07.2022
|
| Edition | 1.1 |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2692-8205 |
| DOI | 10.1101/2022.07.13.499865 |
Cover
| Summary: | Transcription factors (TFs) are essential DNA-binding proteins that regulate the rate of transcription of several genes and controls the expression of genes inside a cell. The prediction of TFs with high precision is important for understanding number of biological processes such as cell-differentiation, intracellular signaling, cell-cycle control. In this study, we developed a hybrid method that combine alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested and evaluated on a large dataset that contain 19406 TFs and 523560 non-TFs protein sequences. In order to avoid biasness in evaluation, dataset is divided in training and validation/independent dataset, where 80% data was used for training and remaining 20% for external validation. In case of alignment-free methods, models are developed based on machine learning techniques using compositional features of a protein. Our best alignment-free model obtained AUC 0.97 on independent dataset. In case of alignment-based method, we used BLAST at different cut-off to predict transcription factors. Though alignment-based method shows excellent performance but unable to cover all transcription factor due to no-hits. In order to combine power of both, we developed a hybrid method that combine alignment-free and alignment-based method; achieved maximum AUC of 0.99 on independent dataset. The method proposed in this study perform better than existing methods. We incorporated the best models in the webserver/standalone package “TransFacPred” (https://webs.iiitd.edu.in/raghava/transfacpred).
Transcription factors (TFs) are vital DNA-binding proteins.
A hybrid method for the prediction of TFs using sequence information.
Computer-aided model were developed using machine-learning algorithm to predict TFs.
Alignment-based and alignment-free approaches were used for the prediction.
A user-friendly webserver, python- and Perl-based standalone package available. |
|---|---|
| Bibliography: | Competing Interest Statement: The authors have declared no competing interest. |
| ISSN: | 2692-8205 |
| DOI: | 10.1101/2022.07.13.499865 |