BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection
A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and...
Saved in:
Published in | Computational biology and chemistry Vol. 99; p. 107732 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.08.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 1476-9271 1476-928X 1476-928X |
DOI | 10.1016/j.compbiolchem.2022.107732 |
Cover
Abstract | A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and related diseases. Several computational models have been developed to early predict promoters from high-throughput sequencing over the past decade. Although some useful predictors have been proposed, there remains short-falls in those models and there is an urgent need to enhance the predictive performance to meet the practice requirements. In this study, we proposed a novel architecture that incorporated transformer natural language processing (NLP) and explainable machine learning to address this problem. More specifically, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model was employed to encode DNA sequences, and SHapley Additive exPlanations (SHAP) analysis served as a feature selection step to look at the top-rank BERT encodings. At the last stage, different machine learning classifiers were implemented to learn the top features and produce the prediction outcomes. This study not only predicted the DNA promoters but also their activities (strong or weak promoters). Overall, several experiments showed an accuracy of 85.5 % and 76.9 % for these two levels, respectively. Our performance showed a superiority to previously published predictors on the same dataset in most measurement metrics. We named our predictor as BERT-Promoter and it is freely available at https://github.com/khanhlee/bert-promoter. |
---|---|
AbstractList | A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and related diseases. Several computational models have been developed to early predict promoters from high-throughput sequencing over the past decade. Although some useful predictors have been proposed, there remains short-falls in those models and there is an urgent need to enhance the predictive performance to meet the practice requirements. In this study, we proposed a novel architecture that incorporated transformer natural language processing (NLP) and explainable machine learning to address this problem. More specifically, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model was employed to encode DNA sequences, and SHapley Additive exPlanations (SHAP) analysis served as a feature selection step to look at the top-rank BERT encodings. At the last stage, different machine learning classifiers were implemented to learn the top features and produce the prediction outcomes. This study not only predicted the DNA promoters but also their activities (strong or weak promoters). Overall, several experiments showed an accuracy of 85.5 % and 76.9 % for these two levels, respectively. Our performance showed a superiority to previously published predictors on the same dataset in most measurement metrics. We named our predictor as BERT-Promoter and it is freely available at https://github.com/khanhlee/bert-promoter.A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and related diseases. Several computational models have been developed to early predict promoters from high-throughput sequencing over the past decade. Although some useful predictors have been proposed, there remains short-falls in those models and there is an urgent need to enhance the predictive performance to meet the practice requirements. In this study, we proposed a novel architecture that incorporated transformer natural language processing (NLP) and explainable machine learning to address this problem. More specifically, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model was employed to encode DNA sequences, and SHapley Additive exPlanations (SHAP) analysis served as a feature selection step to look at the top-rank BERT encodings. At the last stage, different machine learning classifiers were implemented to learn the top features and produce the prediction outcomes. This study not only predicted the DNA promoters but also their activities (strong or weak promoters). Overall, several experiments showed an accuracy of 85.5 % and 76.9 % for these two levels, respectively. Our performance showed a superiority to previously published predictors on the same dataset in most measurement metrics. We named our predictor as BERT-Promoter and it is freely available at https://github.com/khanhlee/bert-promoter. A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and related diseases. Several computational models have been developed to early predict promoters from high-throughput sequencing over the past decade. Although some useful predictors have been proposed, there remains short-falls in those models and there is an urgent need to enhance the predictive performance to meet the practice requirements. In this study, we proposed a novel architecture that incorporated transformer natural language processing (NLP) and explainable machine learning to address this problem. More specifically, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model was employed to encode DNA sequences, and SHapley Additive exPlanations (SHAP) analysis served as a feature selection step to look at the top-rank BERT encodings. At the last stage, different machine learning classifiers were implemented to learn the top features and produce the prediction outcomes. This study not only predicted the DNA promoters but also their activities (strong or weak promoters). Overall, several experiments showed an accuracy of 85.5 % and 76.9 % for these two levels, respectively. Our performance showed a superiority to previously published predictors on the same dataset in most measurement metrics. We named our predictor as BERT-Promoter and it is freely available at https://github.com/khanhlee/bert-promoter. |
ArticleNumber | 107732 |
Author | Ho, Quang-Thai Chang, Jung-Su Nguyen, Van-Nui Le, Nguyen Quoc Khanh |
Author_xml | – sequence: 1 givenname: Nguyen Quoc Khanh surname: Le fullname: Le, Nguyen Quoc Khanh email: khanhlee@tmu.edu.tw organization: Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City 106, Taiwan – sequence: 2 givenname: Quang-Thai surname: Ho fullname: Ho, Quang-Thai organization: College of Information & Communication Technology, Can Tho University, Viet Nam – sequence: 3 givenname: Van-Nui surname: Nguyen fullname: Nguyen, Van-Nui organization: University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Viet Nam – sequence: 4 givenname: Jung-Su surname: Chang fullname: Chang, Jung-Su organization: School of Nutrition and Health Sciences, College of Nutrition, Taipei Medical University, Taipei 110, Taiwan |
BookMark | eNqNkUlLBDEUhIMouP6H4MlLj1kmnRlPjrsgKi7gLcTkRTN0J2PSI4h_3jQtIp48Zav6eFXZRKshBkBol5IRJbTen49MbBfPPjbmFdoRI4yVByk5W0EbdCzrasomT6s_e0nX0WbOc0IYJ0RsoM-j07uH6jbFNnaQDvAsYN8uUnwHizO8LSEYqJ51LsdFAutNFxOODp9cz8rF4MLL7MML7km9qOqS9qEY2mihwTpYfH8xu8UOdLdMULANmM7HsI3WnG4y7HyvW-jx7PTh-KK6ujm_PJ5dVYYL2VV2AswRIo2xRoKuoUSpjWBEW-GsEFPihKCMcc4mgmrH2JgbDpQ7OYFaa76F9gZuGbgkyp1qfTbQNDpAXGbF6imXcjrmrEgPB6lJMecEThnf6X7YPlSjKFF972qufveu-t7V0HtBHPxBLJJvdfr4n_lkMEPp491DUtn4_g-sT6U0ZaP_D-YLkEKpGQ |
CitedBy_id | crossref_primary_10_1038_s41598_024_73342_7 crossref_primary_10_1002_2211_5463_70003 crossref_primary_10_3390_electronics11213577 crossref_primary_10_1002_prot_26536 crossref_primary_10_1093_femsre_fuad030 crossref_primary_10_1155_2022_3265212 crossref_primary_10_1186_s12880_023_01129_9 crossref_primary_10_2174_0115748936264316230926073231 crossref_primary_10_1016_j_bspc_2023_104593 crossref_primary_10_1109_JBHI_2023_3286917 crossref_primary_10_1016_j_ymeth_2024_05_007 crossref_primary_10_3934_mbe_2023586 crossref_primary_10_1109_JBHI_2023_3309840 crossref_primary_10_2174_0115748936285544231221113226 crossref_primary_10_1007_s00521_024_10663_8 crossref_primary_10_1111_bcp_16032 crossref_primary_10_1109_ACCESS_2023_3285197 crossref_primary_10_3389_fgene_2023_1233657 crossref_primary_10_1109_JBHI_2023_3299042 crossref_primary_10_3390_a15110410 crossref_primary_10_1016_j_neucom_2024_128829 crossref_primary_10_1186_s12915_024_01923_z crossref_primary_10_1109_ACCESS_2023_3272056 crossref_primary_10_3390_biomedicines11020581 crossref_primary_10_1186_s13059_023_02955_4 crossref_primary_10_1109_ACCESS_2023_3324061 crossref_primary_10_1007_s10462_023_10692_0 crossref_primary_10_1007_s10722_024_01879_7 crossref_primary_10_3389_frai_2022_1040295 crossref_primary_10_3934_era_2023335 crossref_primary_10_1093_jamia_ocaf029 crossref_primary_10_1109_ACCESS_2023_3297207 crossref_primary_10_1111_1751_7915_70121 crossref_primary_10_3389_fgene_2023_1232038 crossref_primary_10_1109_ACCESS_2023_3326337 crossref_primary_10_2174_1574893618666230316151648 crossref_primary_10_1021_acs_jcim_4c01415 crossref_primary_10_1093_bib_bbad058 crossref_primary_10_1016_j_ymeth_2024_08_005 crossref_primary_10_1109_JBHI_2023_3292299 crossref_primary_10_1186_s12864_023_09796_2 crossref_primary_10_1371_journal_pone_0287031 crossref_primary_10_1155_2023_8862598 crossref_primary_10_1108_K_03_2024_0554 crossref_primary_10_1016_j_compbiomed_2022_106375 crossref_primary_10_1016_j_csbj_2025_03_024 crossref_primary_10_1007_s44174_024_00197_x crossref_primary_10_1016_j_knosys_2023_111316 crossref_primary_10_1109_ACCESS_2022_3233768 crossref_primary_10_1016_j_ab_2024_115495 crossref_primary_10_1007_s00521_023_08706_7 crossref_primary_10_1016_j_compbiolchem_2024_108040 crossref_primary_10_1186_s12938_024_01219_x crossref_primary_10_1109_ACCESS_2023_3280123 crossref_primary_10_3390_ijms242015447 crossref_primary_10_1016_j_heliyon_2024_e28443 crossref_primary_10_1186_s12859_023_05543_2 crossref_primary_10_1038_s41598_024_84105_9 crossref_primary_10_1186_s12859_024_05849_9 crossref_primary_10_3390_biomedicines11051323 crossref_primary_10_2166_hydro_2023_046 |
Cites_doi | 10.3389/fbioe.2019.00305 10.1007/978-1-61779-376-9_6 10.1093/bioinformatics/btz682 10.1016/j.chemolab.2020.104034 10.1371/journal.pone.0171410 10.1093/bib/bbz041 10.1016/j.gene.2021.145643 10.1371/journal.pone.0137950 10.1016/j.ygeno.2018.12.001 10.1186/gb-2006-7-s1-s3 10.1073/pnas.91.4.1460 10.1164/ajrccm.158.6.9804011 10.1016/j.ygeno.2020.01.017 10.1093/bioinformatics/btaa1087 10.1016/j.ygeno.2019.08.009 10.1038/ng780 10.1093/bioinformatics/btg265 10.1093/bioinformatics/btab133 10.1093/nar/gkv1156 10.1002/bies.20734 10.3389/fgene.2019.00286 10.1093/nar/gkg525 10.1093/bib/bbab005 10.1101/2020.09.17.301879 10.1186/1471-2105-6-1 |
ContentType | Journal Article |
Copyright | 2022 Elsevier Ltd Copyright © 2022 Elsevier Ltd. All rights reserved. |
Copyright_xml | – notice: 2022 Elsevier Ltd – notice: Copyright © 2022 Elsevier Ltd. All rights reserved. |
DBID | AAYXX CITATION 7X8 |
DOI | 10.1016/j.compbiolchem.2022.107732 |
DatabaseName | CrossRef MEDLINE - Academic |
DatabaseTitle | CrossRef MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Chemistry Biology |
EISSN | 1476-928X |
ExternalDocumentID | 10_1016_j_compbiolchem_2022_107732 S1476927122001128 |
GroupedDBID | --- --K --M .DC .~1 0R~ 1B1 1~. 1~5 29F 4.4 457 4G. 53G 5GY 5VS 7-5 71M 8P~ AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARLI AAXUO AAYFN ABBOA ABGSF ABMAC ABNUV ABUDA ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADECG ADEWK ADEZE ADJOM ADMUD ADUVX AEBSH AEHWI AEKER AENEX AFKWA AFTJW AFXIZ AFZHZ AGHFR AGRDE AGUBO AGYEJ AHPOS AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AJSZI AKURH ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DOVZS DU5 EBS EFJIC EFLBG EJD ENUVR EO8 EO9 EP2 EP3 F5P FDB FEDTE FIRID FLBIZ FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF HZ~ IHE J1W KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 RIG RNS ROL RPZ SCB SDF SDG SES SEW SPC SPCBC SSG SSK SSU SSV SSZ T5K UHS ZMT ~G- AAHBH AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGCQF AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH 7X8 EFKBS |
ID | FETCH-LOGICAL-c357t-d8e2f007ccdc7ea6e4766c520ad5fd5590f55122332851af2243c3e13f78e6aa3 |
IEDL.DBID | AIKHN |
ISSN | 1476-9271 1476-928X |
IngestDate | Fri Sep 05 14:28:07 EDT 2025 Tue Jul 01 02:02:16 EDT 2025 Thu Apr 24 22:57:16 EDT 2025 Fri Feb 23 02:40:44 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Explainable artificial intelligence SHAP Promoter region BERT multilingual cases EXtreme Gradient Boosting Contextualized word embedding |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c357t-d8e2f007ccdc7ea6e4766c520ad5fd5590f55122332851af2243c3e13f78e6aa3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
PQID | 2693779432 |
PQPubID | 23479 |
ParticipantIDs | proquest_miscellaneous_2693779432 crossref_citationtrail_10_1016_j_compbiolchem_2022_107732 crossref_primary_10_1016_j_compbiolchem_2022_107732 elsevier_sciencedirect_doi_10_1016_j_compbiolchem_2022_107732 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | August 2022 2022-08-00 20220801 |
PublicationDateYYYYMMDD | 2022-08-01 |
PublicationDate_xml | – month: 08 year: 2022 text: August 2022 |
PublicationDecade | 2020 |
PublicationTitle | Computational biology and chemistry |
PublicationYear | 2022 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | Solovyev, Shahmuradov (bib25) 2003; 31 Chen (bib3) 2020; 21 Lundberg, Lee (bib21) 2017; 30 Lee (bib19) 2020; 36 Vlahopoulos (bib29) 2008; 30 Lai, Lu (bib15) 2020; 36 Tayara, Tahir, Chong (bib27) 2020; 112 Kanhere, Bansal (bib13) 2005; 6 Oubounyt (bib23) 2019; 10 Ji, Y., et al., DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics, 2021. Lin (bib20) 2018 Devlin, J., et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1 (Long and Short Papers). 2019. Ionescu-Tîrgovişte, Gagniuc, Guja (bib11) 2015; 10 Tahir (bib26) 2020; 202 Davuluri, Grosse, Zhang (bib4) 2001; 29 Khambata-Ford (bib14) 2003; 13 Gordon (bib9) 2003; 19 Le, N.Q.K., et al., A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Brief Bioinform, 2021a. Xiao (bib30) 2019; 111 Gama-Castro (bib8) 2015; 44 Hobbs (bib10) 1998; 158 Umarov, Solovyev (bib28) 2017; 12 Bajic (bib1) 2006; 7 Charoenkwan, P., et al., BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics, 2021. Lyu (bib22) 2020 Le (bib18) 2021; 787 Do, Le (bib6) 2020; 112 Le (bib16) 2019; 7 Rettinger (bib24) 1994; 91 Gade, P. , D.V. Kalvakolanu, Chromatin Immunoprecipitation assay as a tool for analyzing transcription factor activity. In: Vancura, A., (Ed.), Transcriptional Regulation: Methods and Protocols, 2012, Springer, New York, NY., 85–104. Lai (10.1016/j.compbiolchem.2022.107732_bib15) 2020; 36 Umarov (10.1016/j.compbiolchem.2022.107732_bib28) 2017; 12 Do (10.1016/j.compbiolchem.2022.107732_bib6) 2020; 112 Le (10.1016/j.compbiolchem.2022.107732_bib18) 2021; 787 Davuluri (10.1016/j.compbiolchem.2022.107732_bib4) 2001; 29 Lundberg (10.1016/j.compbiolchem.2022.107732_bib21) 2017; 30 Lyu (10.1016/j.compbiolchem.2022.107732_bib22) 2020 10.1016/j.compbiolchem.2022.107732_bib17 Bajic (10.1016/j.compbiolchem.2022.107732_bib1) 2006; 7 Kanhere (10.1016/j.compbiolchem.2022.107732_bib13) 2005; 6 Ionescu-Tîrgovişte (10.1016/j.compbiolchem.2022.107732_bib11) 2015; 10 Khambata-Ford (10.1016/j.compbiolchem.2022.107732_bib14) 2003; 13 Chen (10.1016/j.compbiolchem.2022.107732_bib3) 2020; 21 Hobbs (10.1016/j.compbiolchem.2022.107732_bib10) 1998; 158 Le (10.1016/j.compbiolchem.2022.107732_bib16) 2019; 7 Gordon (10.1016/j.compbiolchem.2022.107732_bib9) 2003; 19 Lin (10.1016/j.compbiolchem.2022.107732_bib20) 2018 Solovyev (10.1016/j.compbiolchem.2022.107732_bib25) 2003; 31 Vlahopoulos (10.1016/j.compbiolchem.2022.107732_bib29) 2008; 30 Rettinger (10.1016/j.compbiolchem.2022.107732_bib24) 1994; 91 10.1016/j.compbiolchem.2022.107732_bib7 Xiao (10.1016/j.compbiolchem.2022.107732_bib30) 2019; 111 Oubounyt (10.1016/j.compbiolchem.2022.107732_bib23) 2019; 10 10.1016/j.compbiolchem.2022.107732_bib5 10.1016/j.compbiolchem.2022.107732_bib12 Tayara (10.1016/j.compbiolchem.2022.107732_bib27) 2020; 112 Lee (10.1016/j.compbiolchem.2022.107732_bib19) 2020; 36 10.1016/j.compbiolchem.2022.107732_bib2 Gama-Castro (10.1016/j.compbiolchem.2022.107732_bib8) 2015; 44 Tahir (10.1016/j.compbiolchem.2022.107732_bib26) 2020; 202 |
References_xml | – volume: 19 start-page: 1964 year: 2003 end-page: 1971 ident: bib9 article-title: Sequence alignment kernel for recognition of promoter regions publication-title: Bioinformatics – volume: 10 start-page: 286 year: 2019 ident: bib23 article-title: DeePromoter: robust promoter predictor using deep learning publication-title: Front. Genet. – volume: 30 start-page: 314 year: 2008 end-page: 327 ident: bib29 article-title: The role of ATF-2 in oncogenesis publication-title: BioEssays – reference: Charoenkwan, P., et al., BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics, 2021. – volume: 7 start-page: 305 year: 2019 ident: bib16 article-title: Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous fasttext N-grams publication-title: Front Bioeng. Biotechnol. – volume: 31 start-page: 3540 year: 2003 end-page: 3545 ident: bib25 article-title: PromH: promoters identification using orthologous genomic sequences publication-title: Nucleic Acids Res. – volume: 29 start-page: 412 year: 2001 end-page: 417 ident: bib4 article-title: Computational identification of promoters and first exons in the human genome publication-title: Nat. Genet. – volume: 7 start-page: S3 year: 2006 ident: bib1 article-title: Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment publication-title: Genome Biol. – volume: 21 start-page: 1047 year: 2020 end-page: 1057 ident: bib3 article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data publication-title: Brief. Bioinform. – volume: 91 start-page: 1460 year: 1994 end-page: 1464 ident: bib24 article-title: Liver-directed gene therapy: quantitative evaluation of promoter elements by using in vivo retroviral transduction publication-title: Proc. Natl. Acad. Sci. USA – reference: Devlin, J., et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1 (Long and Short Papers). 2019. – volume: 12 year: 2017 ident: bib28 article-title: Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks publication-title: PLoS One – volume: 10 year: 2015 ident: bib11 article-title: Structural properties of gene promoters highlight more than two phenotypes of diabetes publication-title: PLoS One – volume: 44 start-page: D133 year: 2015 end-page: D143 ident: bib8 article-title: RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond publication-title: Nucleic Acids Res. – volume: 36 start-page: 5678 year: 2020 end-page: 5685 ident: bib15 article-title: BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer publication-title: Bioinformatics – reference: Gade, P. , D.V. Kalvakolanu, Chromatin Immunoprecipitation assay as a tool for analyzing transcription factor activity. In: Vancura, A., (Ed.), Transcriptional Regulation: Methods and Protocols, 2012, Springer, New York, NY., 85–104. – volume: 202 year: 2020 ident: bib26 article-title: An intelligent computational model for prediction of promoters and their strength via natural language processing publication-title: Chemom. Intell. Lab. Syst. – volume: 30 start-page: 4765 year: 2017 end-page: 4774 ident: bib21 article-title: A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems – volume: 36 start-page: 1234 year: 2020 end-page: 1240 ident: bib19 article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics – volume: 112 start-page: 2445 year: 2020 end-page: 2451 ident: bib6 article-title: Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features publication-title: Genomics – year: 2020 ident: bib22 article-title: iPro2L-PSTKNC: a two-layer predictor for discovering various types of promoters by position specific of nucleotide composition publication-title: IEEE J. Biomed. Health Inf. – reference: Le, N.Q.K., et al., A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Brief Bioinform, 2021a. – volume: 6 start-page: 1 year: 2005 ident: bib13 article-title: A novel method for prokaryotic promoter prediction based on DNA stability publication-title: BMC Bioinform. – reference: Ji, Y., et al., DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics, 2021. – volume: 787 year: 2021 ident: bib18 article-title: A sequence-based prediction of Kruppel-like factors proteins using XGBoost and optimized features publication-title: Gene – volume: 13 start-page: 1765 year: 2003 end-page: 1774 ident: bib14 article-title: Identification of promoter regions in the human genome by using a retroviral plasmid library-based functional reporter gene assay publication-title: Genome Biol. – volume: 111 start-page: 1785 year: 2019 end-page: 1793 ident: bib30 article-title: iPSW(2L)-PseKNC: a two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition publication-title: Genomics – volume: 112 start-page: 1396 year: 2020 end-page: 1403 ident: bib27 article-title: Identification of prokaryotic promoters and their strength by integrating heterogeneous features publication-title: Genomics – year: 2018 ident: bib20 article-title: Identifying sigma70 promoters with novel pseudo nucleotide composition publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform. – volume: 158 start-page: 1958 year: 1998 end-page: 1962 ident: bib10 article-title: Interleukin-10 and transforming growth factor- β promoter polymorphisms in allergies and asthma publication-title: Am. J. Respir. Crit. Care Med. – volume: 7 start-page: 305 year: 2019 ident: 10.1016/j.compbiolchem.2022.107732_bib16 article-title: Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous fasttext N-grams publication-title: Front Bioeng. Biotechnol. doi: 10.3389/fbioe.2019.00305 – ident: 10.1016/j.compbiolchem.2022.107732_bib7 doi: 10.1007/978-1-61779-376-9_6 – volume: 36 start-page: 1234 issue: 4 year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib19 article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz682 – volume: 202 year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib26 article-title: An intelligent computational model for prediction of promoters and their strength via natural language processing publication-title: Chemom. Intell. Lab. Syst. doi: 10.1016/j.chemolab.2020.104034 – volume: 12 issue: 2 year: 2017 ident: 10.1016/j.compbiolchem.2022.107732_bib28 article-title: Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks publication-title: PLoS One doi: 10.1371/journal.pone.0171410 – volume: 21 start-page: 1047 issue: 3 year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib3 article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data publication-title: Brief. Bioinform. doi: 10.1093/bib/bbz041 – volume: 787 year: 2021 ident: 10.1016/j.compbiolchem.2022.107732_bib18 article-title: A sequence-based prediction of Kruppel-like factors proteins using XGBoost and optimized features publication-title: Gene doi: 10.1016/j.gene.2021.145643 – volume: 10 issue: 9 year: 2015 ident: 10.1016/j.compbiolchem.2022.107732_bib11 article-title: Structural properties of gene promoters highlight more than two phenotypes of diabetes publication-title: PLoS One doi: 10.1371/journal.pone.0137950 – year: 2018 ident: 10.1016/j.compbiolchem.2022.107732_bib20 article-title: Identifying sigma70 promoters with novel pseudo nucleotide composition publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform. – volume: 111 start-page: 1785 issue: 6 year: 2019 ident: 10.1016/j.compbiolchem.2022.107732_bib30 article-title: iPSW(2L)-PseKNC: a two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition publication-title: Genomics doi: 10.1016/j.ygeno.2018.12.001 – volume: 7 start-page: S3 issue: 1 year: 2006 ident: 10.1016/j.compbiolchem.2022.107732_bib1 article-title: Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment publication-title: Genome Biol. doi: 10.1186/gb-2006-7-s1-s3 – year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib22 article-title: iPro2L-PSTKNC: a two-layer predictor for discovering various types of promoters by position specific of nucleotide composition publication-title: IEEE J. Biomed. Health Inf. – volume: 91 start-page: 1460 issue: 4 year: 1994 ident: 10.1016/j.compbiolchem.2022.107732_bib24 article-title: Liver-directed gene therapy: quantitative evaluation of promoter elements by using in vivo retroviral transduction publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.91.4.1460 – volume: 158 start-page: 1958 issue: 6 year: 1998 ident: 10.1016/j.compbiolchem.2022.107732_bib10 article-title: Interleukin-10 and transforming growth factor- β promoter polymorphisms in allergies and asthma publication-title: Am. J. Respir. Crit. Care Med. doi: 10.1164/ajrccm.158.6.9804011 – ident: 10.1016/j.compbiolchem.2022.107732_bib5 – volume: 112 start-page: 2445 issue: 3 year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib6 article-title: Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features publication-title: Genomics doi: 10.1016/j.ygeno.2020.01.017 – volume: 36 start-page: 5678 issue: 24 year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib15 article-title: BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa1087 – volume: 13 start-page: 1765 issue: 7 year: 2003 ident: 10.1016/j.compbiolchem.2022.107732_bib14 article-title: Identification of promoter regions in the human genome by using a retroviral plasmid library-based functional reporter gene assay publication-title: Genome Biol. – volume: 112 start-page: 1396 issue: 2 year: 2020 ident: 10.1016/j.compbiolchem.2022.107732_bib27 article-title: Identification of prokaryotic promoters and their strength by integrating heterogeneous features publication-title: Genomics doi: 10.1016/j.ygeno.2019.08.009 – volume: 29 start-page: 412 issue: 4 year: 2001 ident: 10.1016/j.compbiolchem.2022.107732_bib4 article-title: Computational identification of promoters and first exons in the human genome publication-title: Nat. Genet. doi: 10.1038/ng780 – volume: 19 start-page: 1964 issue: 15 year: 2003 ident: 10.1016/j.compbiolchem.2022.107732_bib9 article-title: Sequence alignment kernel for recognition of promoter regions publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg265 – ident: 10.1016/j.compbiolchem.2022.107732_bib2 doi: 10.1093/bioinformatics/btab133 – volume: 30 start-page: 4765 year: 2017 ident: 10.1016/j.compbiolchem.2022.107732_bib21 article-title: A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems – volume: 44 start-page: D133 issue: D1 year: 2015 ident: 10.1016/j.compbiolchem.2022.107732_bib8 article-title: RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkv1156 – volume: 30 start-page: 314 issue: 4 year: 2008 ident: 10.1016/j.compbiolchem.2022.107732_bib29 article-title: The role of ATF-2 in oncogenesis publication-title: BioEssays doi: 10.1002/bies.20734 – volume: 10 start-page: 286 year: 2019 ident: 10.1016/j.compbiolchem.2022.107732_bib23 article-title: DeePromoter: robust promoter predictor using deep learning publication-title: Front. Genet. doi: 10.3389/fgene.2019.00286 – volume: 31 start-page: 3540 issue: 13 year: 2003 ident: 10.1016/j.compbiolchem.2022.107732_bib25 article-title: PromH: promoters identification using orthologous genomic sequences publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkg525 – ident: 10.1016/j.compbiolchem.2022.107732_bib17 doi: 10.1093/bib/bbab005 – ident: 10.1016/j.compbiolchem.2022.107732_bib12 doi: 10.1101/2020.09.17.301879 – volume: 6 start-page: 1 issue: 1 year: 2005 ident: 10.1016/j.compbiolchem.2022.107732_bib13 article-title: A novel method for prokaryotic promoter prediction based on DNA stability publication-title: BMC Bioinform. doi: 10.1186/1471-2105-6-1 |
SSID | ssj0023005 |
Score | 2.5679345 |
Snippet | A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because... |
SourceID | proquest crossref elsevier |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 107732 |
SubjectTerms | BERT multilingual cases Contextualized word embedding Explainable artificial intelligence EXtreme Gradient Boosting Promoter region SHAP |
Title | BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection |
URI | https://dx.doi.org/10.1016/j.compbiolchem.2022.107732 https://www.proquest.com/docview/2693779432 |
Volume | 99 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1La9wwEB6SDaW9hDZtaZJ2UaFXdW3Ja9mBHtxtwvbBsuQBuQlZj7CleJfN5hAK_e2Zke3QFgqBHv0YW8wn5iF9mgF4Z51LPS29icKUPKvzgtchDdwnxqmgYg0UYlvM8ulF9uVyfLkFk_4sDNEqO9vf2vRorbs7o06bo9ViMTpLM5WXQqWCaEFoZrdhR6C3LwawU33-Op3d511UkT0eMlI5J4G-9mikeRFzm-odoYroYLoQ-EApKf7lp_6y2NENnTyF3S5-ZFU7xGew5Zs9eNR2lLzdg8eTvoHbc_j58fj0nM8j386vj1jVsEVcQvCO9QxqTl7MsdWa9msw_WbLwD7NKrzRSjHixV8x-hK9xGNHCRSIDXSYaRw7m1ZzFnysD8quY1cdhPoFXJwcn0-mvOu1wK0cqw13hRcB4wVrnVXe5B6VlduxQMjGwWHakQSMrTCWkAJjNBPQ80srfSqDKnxujHwJg2bZ-FfALHr8El1emRU-w-yuVoUTtU2SOq9NnSX7UPaa1bYrRE6j_6F7xtl3_TsqmlDRLSr7IO9lV205jgdJfegB1H9MLo1-40Hyb3vUNWJIWyqm8cubay3ykio2ZlIc_Oc_DuEJXbXcwtcw2Kxv_BuMdzb1ELbf_0qHOKsnp9_mw2523wF2_gJ3 |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3da9RAEB_qFakvolWxtuoKvi6X7OZ2k0If4tmS2noUe4W-Lcl-yInkjuv1QfznndkkRQWh4OtuJgnzW-Zj97czAO-tc6mnrTeR1wXPGpXzJqSB-6R2OuhYA4XYFjNVXWWfrifXWzAd7sIQrbK3_Z1Nj9a6Hxn32hyvFovxZZppVQidCqIFoZl9ANsZNbUewXZ5elbN7vIuqsgeLxlpxUlgqD0aaV7E3KZ6R6giupguBE5oLcW__NRfFju6oZMn8LiPH1nZ_eJT2PLtLjzsOkr-2IWd6dDA7Rn8_HD8Zc4vIt_Orw9Z2bJF3ELwjg0Mak5ezLHVms5rMP1my8A-zkoc6KQY8eK_MnoTPcRjRwkUiA10WN06dlmVFyz4WB-U3cSuOgj1c7g6OZ5PK973WuBWTvSGu9yLgPGCtc5qXyuPylJ2IhCySXCYdiQBYyuMJaTAGK0O6PmllT6VQede1bV8AaN22fqXwCx6_AJdXpHlPsPsrtG5E41NkkY1dZMle1AMmjW2L0ROf__dDIyzb-Z3VAyhYjpU9kDeya66chz3kjoaADR_LC6DfuNe8u8G1A1iSEcqdeuXtzdGqIIqNmZSvPrPb7yFnWr--dycn87O9uERzXQ8wwMYbda3_jXGPpvmTb-2fwF3SQLI |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BERT-Promoter%3A+An+improved+sequence-based+predictor+of+DNA+promoter+using+BERT+pre-trained+model+and+SHAP+feature+selection&rft.jtitle=Computational+biology+and+chemistry&rft.au=Le%2C+Nguyen+Quoc+Khanh&rft.au=Ho%2C+Quang-Thai&rft.au=Nguyen%2C+Van-Nui&rft.au=Chang%2C+Jung-Su&rft.date=2022-08-01&rft.issn=1476-9271&rft.volume=99&rft.spage=107732&rft_id=info:doi/10.1016%2Fj.compbiolchem.2022.107732&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_compbiolchem_2022_107732 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1476-9271&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1476-9271&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1476-9271&client=summon |