A hybrid approach for predicting transcription factors

Transcription factors (TFs) are essential DNA-binding proteins that regulate the rate of transcription of several genes and controls the expression of genes inside a cell. The prediction of TFs with high precision is important for understanding number of biological processes such as cell-differentia...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Patiyal, Sumeet, Tiwari, Palak, Ghai, Mohit, Dhapola, Aman, Dhall, Anjali, Raghava, Gajendra P. S.
Format Paper
LanguageEnglish
Published Cold Spring Harbor Laboratory 14.07.2022
Edition1.1
Subjects
Online AccessGet full text
ISSN2692-8205
DOI10.1101/2022.07.13.499865

Cover

More Information
Summary:Transcription factors (TFs) are essential DNA-binding proteins that regulate the rate of transcription of several genes and controls the expression of genes inside a cell. The prediction of TFs with high precision is important for understanding number of biological processes such as cell-differentiation, intracellular signaling, cell-cycle control. In this study, we developed a hybrid method that combine alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested and evaluated on a large dataset that contain 19406 TFs and 523560 non-TFs protein sequences. In order to avoid biasness in evaluation, dataset is divided in training and validation/independent dataset, where 80% data was used for training and remaining 20% for external validation. In case of alignment-free methods, models are developed based on machine learning techniques using compositional features of a protein. Our best alignment-free model obtained AUC 0.97 on independent dataset. In case of alignment-based method, we used BLAST at different cut-off to predict transcription factors. Though alignment-based method shows excellent performance but unable to cover all transcription factor due to no-hits. In order to combine power of both, we developed a hybrid method that combine alignment-free and alignment-based method; achieved maximum AUC of 0.99 on independent dataset. The method proposed in this study perform better than existing methods. We incorporated the best models in the webserver/standalone package “TransFacPred” (https://webs.iiitd.edu.in/raghava/transfacpred). Transcription factors (TFs) are vital DNA-binding proteins. A hybrid method for the prediction of TFs using sequence information. Computer-aided model were developed using machine-learning algorithm to predict TFs. Alignment-based and alignment-free approaches were used for the prediction. A user-friendly webserver, python- and Perl-based standalone package available.
Bibliography:Competing Interest Statement: The authors have declared no competing interest.
ISSN:2692-8205
DOI:10.1101/2022.07.13.499865