Prediction of RNA–protein interactions using a nucleotide language model
Motivation The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences....
Saved in:
| Published in | Bioinformatics advances Vol. 2; no. 1; p. vbac023 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
England
Oxford University Press
2022
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2635-0041 2635-0041 |
| DOI | 10.1093/bioadv/vbac023 |
Cover
| Summary: | Motivation
The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations.
Results
Here, we propose BERT-RBP as a model to predict RNA–RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems.
Availability and implementation
Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)].
Supplementary information
Supplementary data are available at Bioinformatics Advances online. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2635-0041 2635-0041 |
| DOI: | 10.1093/bioadv/vbac023 |