Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated wit...
        Saved in:
      
    
          | Published in | BMC bioinformatics Vol. 23; no. 1; pp. 4 - 23 | 
|---|---|
| Main Authors | , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          BioMed Central
    
        04.01.2022
     BioMed Central Ltd Springer Nature B.V BMC  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1471-2105 1471-2105  | 
| DOI | 10.1186/s12859-021-04504-x | 
Cover
| Abstract | Motivation
Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.
Method
We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.
Results and conclusion
The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter
≈
5700
(4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. | 
    
|---|---|
| AbstractList | Motivation
Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.
Method
We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.
Results and conclusion
The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter
≈
5700
(4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Motivation We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models--dubbed PPI-BioBERT-x10--to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [formula omitted] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models--dubbed PPI-BioBERT-x10--to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [formula omitted] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Keywords: Protein-protein interaction, Post-translational modifications, BioBERT, Natural language processing, Deep learning, Distant supervision Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter $$\approx 5700$$ ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter \(\approx 5700\) (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.MOTIVATIONProtein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.METHODWe use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.RESULTS AND CONCLUSIONThe PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.  | 
    
| ArticleNumber | 4 | 
    
| Audience | Academic | 
    
| Author | Li, Yuan Pires, Douglas E. V. Elangovan, Aparna Verspoor, Karin Davis, Melissa J.  | 
    
| Author_xml | – sequence: 1 givenname: Aparna surname: Elangovan fullname: Elangovan, Aparna organization: School of Computing and Information Systems, The University of Melbourne – sequence: 2 givenname: Yuan surname: Li fullname: Li, Yuan organization: School of Computing and Information Systems, The University of Melbourne – sequence: 3 givenname: Douglas E. V. surname: Pires fullname: Pires, Douglas E. V. organization: School of Computing and Information Systems, The University of Melbourne – sequence: 4 givenname: Melissa J. surname: Davis fullname: Davis, Melissa J. organization: The Walter and Eliza Hall Institute of Medical Research, Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne – sequence: 5 givenname: Karin surname: Verspoor fullname: Verspoor, Karin email: karin.verspoor@rmit.edu.au organization: School of Computing and Information Systems, The University of Melbourne, School of Computing Technologies, RMIT University  | 
    
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34983371$$D View this record in MEDLINE/PubMed | 
    
| BookMark | eNqNUsluFDEUbKEgssAPcEAtcYFDB2-9-IKURAFGGgkphLP1xkvHUY892O5kcuTP8SwkmQhFyAcvr6psV73DYs95p4viLUbHGHfNp4hJV_MKEVwhViNWLV8UB5i1uCIY1XuP1vvFYYzXCOG2Q_WrYp8y3lHa4oPi9xRCr6soYdDlIvikrau2c7nwMVUpgIsDJOsdDOXcK2usXG9LvcxFuV7e2nRVKhsTuFTGcaHDjY2rAjhVSu-MVdpJXeZ77CxA0qo8tf70_OLydfHSwBD1m-18VPz8cn559q2afv86OTuZVrJhOFWtYZxD3YDiSEqFKGCkZ_lzM8Na0knUYj5rpDTGMIWNpIqqRhMElGWiBnpUTDa6ysO1WAQ7h3AnPFixPvChFxCSlYMWtNOtxAxDi3g2Nt-JoWslMRwUqzHJWnSjNboF3N3CMNwLYiRW4YhNOCKHI9bhiGVmfd6wFuNsrpXULts37Dxlt-Lslej9jehawhGts8CHrUDwv0Ydk5jbKPUwgNN-jII0uOENRYxn6Psn0Gs_hpzgCkUQpx0n-AHV5_iFdcavAl2JipOG04aThnUZdfwPVB5Kz23OVhubz3cIH3cIGZNyr_QwxigmPy52se8em3Lvxt8WzYBuA5DBxxi0EdKmdf_lV9jhecPJE-p_pbTNNmaw63V4cO4Z1h_e3hxW | 
    
| CitedBy_id | crossref_primary_10_3390_jpm14121157 crossref_primary_10_1186_s13643_024_02470_y crossref_primary_10_1038_s41568_024_00784_6 crossref_primary_10_1007_s44163_024_00197_2 crossref_primary_10_1016_j_mcpro_2023_100682  | 
    
| Cites_doi | 10.1093/database/bav009 10.1186/1471-2105-8-50 10.1093/bioinformatics/btz682 10.18653/v1/W17-2323 10.1093/nar/gkj141 10.1093/nar/gku1267 10.18653/v1/2020.blackboxnlp-1.21 10.1109/TCBB.2014.2372765 10.1016/S2589-7500(20)30186-2 10.1109/ACCESS.2019.2927253 10.1093/nar/gku1055 10.18653/v1/N19-1423 10.1093/nar/gks1094 10.1093/database/bav020 10.1093/database/bax040 10.1093/nar/gkx1104 10.1016/j.artmed.2004.07.016 10.1016/j.ijhcs.2019.05.008 10.1016/j.knosys.2018.11.020 10.1093/nar/gkt1115 10.1155/2015/918710 10.1093/nar/gky1131 10.1007/s10994-021-05946-3 10.1093/database/bay122 10.3115/1690219.1690287 10.1038/nmeth.1931 10.18653/v1/2021.eacl-main.113 10.18653/v1/W17-2304 10.1093/nar/gky1049  | 
    
| ContentType | Journal Article | 
    
| Copyright | The Author(s) 2021 2021. The Author(s). COPYRIGHT 2022 BioMed Central Ltd. 2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.  | 
    
| Copyright_xml | – notice: The Author(s) 2021 – notice: 2021. The Author(s). – notice: COPYRIGHT 2022 BioMed Central Ltd. – notice: 2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.  | 
    
| DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM ISR 3V. 7QO 7SC 7X7 7XB 88E 8AL 8AO 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABUWG AEUYN AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ JQ2 K7- K9. L7M LK8 L~C L~D M0N M0S M1P M7P P5Z P62 P64 PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI Q9U 7X8 5PM ADTOC UNPAY DOA  | 
    
| DOI | 10.1186/s12859-021-04504-x | 
    
| DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Science ProQuest Central (Corporate) Biotechnology Research Abstracts Computer and Information Systems Abstracts Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Journals Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials Biological Science Collection ProQuest Central ProQuest Technology Collection Natural Science Collection ProQuest One Community College ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection (Proquest) ProQuest Computer Science Collection Computer Science Database ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Biological Sciences Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional Computing Database Health & Medical Collection (Alumni Edition) Medical Database ProQuest Biological Science Database Advanced Technologies & Aerospace Collection ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database (Proquest) ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals  | 
    
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Engineering Research Database ProQuest One Academic ProQuest One Academic (New) Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central ProQuest Health & Medical Research Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Advanced Technologies Database with Aerospace ProQuest Computing ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest Medical Library ProQuest Central (Alumni) MEDLINE - Academic  | 
    
| DatabaseTitleList | Publicly Available Content Database MEDLINE MEDLINE - Academic  | 
    
| Database_xml | – sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 5 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 6 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Biology | 
    
| EISSN | 1471-2105 | 
    
| EndPage | 23 | 
    
| ExternalDocumentID | oai_doaj_org_article_38e7c141a7094506ad1a87c2f9ad4512 10.1186/s12859-021-04504-x PMC8729035 A693692648 34983371 10_1186_s12859_021_04504_x  | 
    
| Genre | Journal Article | 
    
| GeographicLocations | Australia | 
    
| GeographicLocations_xml | – name: Australia | 
    
| GroupedDBID | --- 0R~ 23N 2WC 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADMLS ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO ICD IHR INH INR ISR ITC K6V K7- KQ8 LK8 M1P M48 M7P MK~ ML0 M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO RBZ RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB AAYXX CITATION ALIPV CGR CUY CVF ECM EIF NPM 3V. 7QO 7SC 7XB 8AL 8FD 8FK FR3 JQ2 K9. L7M L~C L~D M0N P64 PKEHL PQEST PQUKI Q9U 7X8 5PM 123 2VQ 4.4 ADRAZ ADTOC AHSBF C1A EJD H13 IPNFZ RIG UNPAY  | 
    
| ID | FETCH-LOGICAL-c641t-7f499a56ad90ccd03a10eb210bf4728c0719b6ccfff4d1fc3d3d6e20a34499ea3 | 
    
| IEDL.DBID | M48 | 
    
| ISSN | 1471-2105 | 
    
| IngestDate | Fri Oct 03 12:50:29 EDT 2025 Sun Oct 26 03:34:24 EDT 2025 Tue Sep 30 16:37:31 EDT 2025 Sun Aug 24 04:13:01 EDT 2025 Mon Oct 06 18:39:27 EDT 2025 Mon Oct 20 22:02:25 EDT 2025 Mon Oct 20 16:33:37 EDT 2025 Thu Oct 16 14:42:52 EDT 2025 Mon Jul 21 06:06:02 EDT 2025 Wed Oct 01 04:15:38 EDT 2025 Thu Apr 24 23:11:50 EDT 2025 Sat Sep 06 07:27:22 EDT 2025  | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 1 | 
    
| Keywords | Deep learning Distant supervision Natural language processing Post-translational modifications Protein-protein interaction BioBERT  | 
    
| Language | English | 
    
| License | 2021. The Author(s). Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. cc-by  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c641t-7f499a56ad90ccd03a10eb210bf4728c0719b6ccfff4d1fc3d3d6e20a34499ea3 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
    
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-021-04504-x | 
    
| PMID | 34983371 | 
    
| PQID | 2620938921 | 
    
| PQPubID | 44065 | 
    
| PageCount | 23 | 
    
| ParticipantIDs | doaj_primary_oai_doaj_org_article_38e7c141a7094506ad1a87c2f9ad4512 unpaywall_primary_10_1186_s12859_021_04504_x pubmedcentral_primary_oai_pubmedcentral_nih_gov_8729035 proquest_miscellaneous_2616963049 proquest_journals_2620938921 gale_infotracmisc_A693692648 gale_infotracacademiconefile_A693692648 gale_incontextgauss_ISR_A693692648 pubmed_primary_34983371 crossref_citationtrail_10_1186_s12859_021_04504_x crossref_primary_10_1186_s12859_021_04504_x springer_journals_10_1186_s12859_021_04504_x  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2022-01-04 | 
    
| PublicationDateYYYYMMDD | 2022-01-04 | 
    
| PublicationDate_xml | – month: 01 year: 2022 text: 2022-01-04 day: 04  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | London | 
    
| PublicationPlace_xml | – name: London – name: England  | 
    
| PublicationTitle | BMC bioinformatics | 
    
| PublicationTitleAbbrev | BMC Bioinformatics | 
    
| PublicationTitleAlternate | BMC Bioinformatics | 
    
| PublicationYear | 2022 | 
    
| Publisher | BioMed Central BioMed Central Ltd Springer Nature B.V BMC  | 
    
| Publisher_xml | – name: BioMed Central – name: BioMed Central Ltd – name: Springer Nature B.V – name: BMC  | 
    
| References | 4504_CR30 H Zhang (4504_CR26) 2019; 7 TU Consortium (4504_CR5) 2018; 47 R Bunescu (4504_CR11) 2005; 33 4504_CR14 R Raisamo (4504_CR37) 2019; 131 4504_CR13 H Huang (4504_CR20) 2017; 46 4504_CR35 4504_CR32 J Futoma (4504_CR24) 2020; 2 4504_CR33 S Orchard (4504_CR3) 2013; 42 D Szklarczyk (4504_CR18) 2018; 47 S Orchard (4504_CR27) 2012; 9 S Orchard (4504_CR1) 2013; 42 S Orchard (4504_CR4) 2015 E Hüllermeier (4504_CR34) 2021; 110 GR Brown (4504_CR29) 2014; 43 S Pyysalo (4504_CR12) 2007; 8 M Torii (4504_CR17) 2015; 12 4504_CR9 4504_CR28 4504_CR25 N Srivastava (4504_CR31) 2014; 15 L Mottin (4504_CR36) 2017 4504_CR23 4504_CR22 J Lee (4504_CR10) 2019; 36 4504_CR6 4504_CR7 4504_CR8 Q Chen (4504_CR19) 2018 A Franceschini (4504_CR15) 2012; 41 CO Tudor (4504_CR16) 2015 PV Hornbeck (4504_CR21) 2014; 43 GR Mishra (4504_CR2) 2006; 34  | 
    
| References_xml | – year: 2015 ident: 4504_CR4 publication-title: Database doi: 10.1093/database/bav009 – volume: 8 start-page: 50 issue: 1 year: 2007 ident: 4504_CR12 publication-title: BMC Bioinf doi: 10.1186/1471-2105-8-50 – ident: 4504_CR6 – volume: 36 start-page: 1234 issue: 4 year: 2019 ident: 4504_CR10 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz682 – ident: 4504_CR25 doi: 10.18653/v1/W17-2323 – volume: 34 start-page: 411 year: 2006 ident: 4504_CR2 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkj141 – volume: 43 start-page: 512 issue: D1 year: 2014 ident: 4504_CR21 publication-title: Nucleic Acids Res doi: 10.1093/nar/gku1267 – ident: 4504_CR35 – ident: 4504_CR33 doi: 10.18653/v1/2020.blackboxnlp-1.21 – volume: 12 start-page: 17 issue: 1 year: 2015 ident: 4504_CR17 publication-title: IEEE/ACM Trans Comput Biol Bioinf doi: 10.1109/TCBB.2014.2372765 – volume: 2 start-page: 489 issue: 9 year: 2020 ident: 4504_CR24 publication-title: Lancet Digital Health doi: 10.1016/S2589-7500(20)30186-2 – volume: 7 start-page: 89354 year: 2019 ident: 4504_CR26 publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2927253 – volume: 43 start-page: 36 issue: D1 year: 2014 ident: 4504_CR29 publication-title: Nucleic Acids Res doi: 10.1093/nar/gku1055 – ident: 4504_CR30 doi: 10.18653/v1/N19-1423 – ident: 4504_CR7 – ident: 4504_CR22 – volume: 15 start-page: 1929 issue: 56 year: 2014 ident: 4504_CR31 publication-title: J Mach Learn Res – volume: 41 start-page: 808 issue: D1 year: 2012 ident: 4504_CR15 publication-title: Nucleic Acids Res doi: 10.1093/nar/gks1094 – year: 2015 ident: 4504_CR16 publication-title: Database doi: 10.1093/database/bav020 – year: 2017 ident: 4504_CR36 publication-title: Database doi: 10.1093/database/bax040 – ident: 4504_CR13 – volume: 46 start-page: 542 issue: D1 year: 2017 ident: 4504_CR20 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkx1104 – volume: 33 start-page: 139 issue: 2 year: 2005 ident: 4504_CR11 publication-title: Artif Intell Med doi: 10.1016/j.artmed.2004.07.016 – volume: 131 start-page: 131 year: 2019 ident: 4504_CR37 publication-title: Int J Human-Comput Stud. doi: 10.1016/j.ijhcs.2019.05.008 – ident: 4504_CR32 doi: 10.1016/j.knosys.2018.11.020 – volume: 42 start-page: 358 issue: (D1) year: 2013 ident: 4504_CR3 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkt1115 – ident: 4504_CR28 doi: 10.1155/2015/918710 – volume: 47 start-page: 607 issue: D1 year: 2018 ident: 4504_CR18 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gky1131 – volume: 110 start-page: 457 issue: 3 year: 2021 ident: 4504_CR34 publication-title: Mach Learn doi: 10.1007/s10994-021-05946-3 – year: 2018 ident: 4504_CR19 publication-title: Database doi: 10.1093/database/bay122 – ident: 4504_CR9 doi: 10.3115/1690219.1690287 – volume: 9 start-page: 345 issue: 4 year: 2012 ident: 4504_CR27 publication-title: Nature Methods doi: 10.1038/nmeth.1931 – ident: 4504_CR8 – ident: 4504_CR23 doi: 10.18653/v1/2021.eacl-main.113 – ident: 4504_CR14 doi: 10.18653/v1/W17-2304 – volume: 42 start-page: 358 issue: D1 year: 2013 ident: 4504_CR1 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkt1115 – volume: 47 start-page: 506 issue: D1 year: 2018 ident: 4504_CR5 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gky1049  | 
    
| SSID | ssj0017805 | 
    
| Score | 2.4519792 | 
    
| Snippet | Motivation
Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions... Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are... Motivation We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and... Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions... Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein...  | 
    
| SourceID | doaj unpaywall pubmedcentral proquest gale pubmed crossref springer  | 
    
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source Publisher  | 
    
| StartPage | 4 | 
    
| SubjectTerms | Algorithms Analysis Annotations Automation BioBERT Bioinformatics Biomedical and Life Sciences Calibration Computational Biology/Bioinformatics Computational linguistics Computer Appl. in Life Sciences Data Mining Datasets Deep learning Distant supervision Enzymes Humans Language processing Life Sciences Machine learning Methods Microarrays Natural language interfaces Natural language processing Noise Phosphorylation Post-translation Post-translational modification Post-translational modifications Predictions Protein interaction Protein Processing, Post-Translational Protein-protein interaction Protein-protein interactions Proteins PubMed Supervision Test sets Translation Uniqueness  | 
    
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELZQJQQcEG8CBRmExIFajWOv7Rxb1Kog4FBaqTfL8QMqLdlVkxXtkX_OTJINDUiFA6do43E2mZnMwxl_Q8grD15UJMS30zox6aNipUqaaQ--jMcy8W4vzMdP6uBYvj-ZnVxq9YU1YT08cM-4bWGi9lxypyERmeXKBe4MXCmVLshZ11-4yE25TqaG7weI1L_eImPUdsMRp41hOQKEMLlk5xM31KH1_2mTLzml3wsmx6-mt8iNVb10F9_dfH7JMe3fIbeHiJLu9E9yl1yL9T1yve8xeXGf_PiAtd6sAVlE2qEynNZsONLlomlZi-5qPiwK0m-LgNVD3U8Klvus3_lAccGWBow265Y2qyXaGFxpo64OFJLq1HcnpfA_mIFDIEvhHnb3Do8ekOP9vaO3B2zou8C8krxlOkEa5GbA6jL3PuTC8RwScJ5XSerCeIhKykp5n1KSgScvgggqFrkTEiZGJx6SjXpRx8eEVlHo4JPhDi6tY6hM4XLnimiM11WaZYSvxWD9AEqOvTHmtktOjLK96CyIznais-cZeTPOWfaQHFdS76J0R0qE0-5OgJLZQcns35QsIy9RNywCZtRYkfPFrZrGvvt8aHcUtkTEOsGMvB6I0gKF44YNDsAJxNiaUG5OKOGN9tPhtQrawaI0FhsHlBBdFjwjL8ZhnIlVcnVcrJCGKzCokPRl5FGvseNzC1kaITTM1hNdnjBmOlKffu3wxg0kYLkAWW2ttf7XbV3F-K3xzfgHOT35H3J6Sm4WuD0Fl8jkJtloz1bxGQSNbfW8sw8_AYCEaGs priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELdGJwQ8IL4GgYEMQuKBRYvjNE4eEFpRp4GgQmWT9mY5_hiTSlKaVGyP_Ofc5WsLSBVPVWu7iX3n-7DvfkfIaw1alDvEtxPC-ZG2sZ_GTvhCgy5jNnWszoX5MouPTqJPp-PTLTLrcmEwrLKTibWgNoXGM_J9BE5PQbuG7P3yp49Vo_B2tSuhodrSCuZdDTF2g2yHiIw1ItuT6ezrvL9XQAT_LnUmifdLhvhtPoYpgGkTRP7FQD3VKP7_yupryurvQMr-NvUOubXOl-ryl1osrimsw3vkbmtp0oOGNe6TLZs_IDeb2pOXD8nvzxgD7pdAI0trtIbz3G8_6bIoK79CNbZoDwvpj8JgVFH9lYJEXzUZERQPcqlBKzSvaLleouzBEziqckPB2XZN1VIKz0HPHAxcCu8wmc6PH5GTw-nxhyO_rcfg6zhilS8cuEdqHCuTBlqbgCsWgGPOgsxFIkw0WCtpFmvtnIsMc5obbmIbBopHMNAqvkNGeZHbJ4RmlgujXcIU_LWwJktCFSgV2iTRInNjj7CODFK3YOVYM2Mha6cliWVDOgmkkzXp5IVH3vZjlg1Ux8beE6Ru3xNhtusfitWZbHet5IkVmkVMCfCCxwHMnKkE2NilykRgKnnkFfKGRCCNHCN1ztS6LOXHb3N5EGOpRIwf9MibtpMrkDiqTXyAlUDsrUHP3UFP2Ol62NyxoGwlTSmv9oVHXvbNOBKj53JbrLEPi0HQgjPokccNx_bz5lGacC5gtBjw8mBhhi35-fcahzwBxyzgQKu9juuvXmvTwu_1O-M_6PR086SfkdshJqTgoVi0S0bVam2fg5lYZS_avf8HtcBnIg priority: 102 providerName: ProQuest – databaseName: Springer Nature OA Free Journals dbid: C6C link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZQEQIOiDeBggxC4kAt4thrO8e2alUQcCit1Jvl-AGVluyqyQp65J8zk2RDA6iC02rX42zimczDnvmGkJcerKhIiG-ndWLSR8VKlTTTHmwZj2XiXS3Mh4_q4Fi-O5mdDDA5WAtz8fyeG_Wm4YiwxjCRAJyPXDLwF6-CkVLdwazaHU8MEJt_XRTz13kTw9Ph8_-phS-Yod9TJMdz0pvk-qpeuvNvbj6_YIr2b5Nbgw9Jt3um3yFXYn2XXOu7Sp7fIz_eY3Y3a2D1I-1wGE5rNnzS5aJpWYsGaj5sA9Kvi4D5Qt1XCrr6rK91oLhFSwP6l3VLm9UStQrurVFXBwphdOr7kVL4H4y5wXWlcA87e4dH98nx_t7R7gEbOi0wryRvmU4Q-LiZcqHMvQ-5cDyHkJvnVZK6MB78kLJS3qeUZODJiyCCikXuhISJ0YkHZKNe1PERoVUUOvhkuINL6xgqU7jcuSIa43WVZhnhazZYP8CQYzeMue3CEaNszzoLrLMd6-z3jLwe5yx7EI5LqXeQuyMlAmh3P4Bc2eF9tMJE7bnkTkN8O8vhybkzIKCpdEGCfGXkBcqGRYiMGnNwPrtV09i3nw7ttsImiJgZmJFXA1FaIHPcUNIAK4GoWhPKzQklvMN-OrwWQTvokMZiq4AS_MmCZ-T5OIwzMS-ujosV0nAFKhTCvIw87CV2fG4hSyOEhtl6IsuThZmO1KdfOoRxAyFXLoBXW2up_3Vbly381vhm_AOfHv_f1Z-QGwWWnuD2l9wkG-3ZKj4Fh7CtnnWa4CcaeVn6 priority: 102 providerName: Springer Nature – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwELemTQh44PsjMJBBSDywdHGdxMljhzYNBBMqqzSeLMexR0WXVE0iGG_859zli2agCSSeqtbnNj77vty73xHyQoMV5Rbx7YSwrq9N6MahFa7QYMuYiS2ra2HeH4WHM__tSXCyQaZdLUxyppN53oKGIlDxaL0MfdFUOWAXBbPaXaa2Efoo3C0YIrG5mHAATornu-BXboUB-OebZGt29GHyqS4zEsyFGCfoqmf-OHFgoWog_9_V9Zq9uphL2f-hep1crbKlOv-qFos1m3VwkxTdaptUlS-jqkxG-vsFIMj_y45b5Ebr4tJJcyZvkw2T3SFXmqaX53fJj3eYfO4WcDgMrWEi5pnbvtJlXpRuifZz0d5S0rM8xXSm-i0FU7JqSjEo3iDTFN3frKRFtUSlh1d_VGUphSjfNu1SKfwOXgmAZ03hGfb2p8f3yOxg__j1ods2gnB16LPSFRbiMhWEKo09rVOPK-aZBDYysb4YRxrcpDgJtbbW-imzmqc8Dc3YU9yHiUbx-2QzyzPzkNDEcJFqGzEFXy1MmkRj5Sk1NlGkRWIDh7Bu86VuUdKxWcdC1tFSFMqGtRJYK2vWym8OedXPWTYYIZdS7-GZ6ikR37v-IF-dylZdSB4ZoZnPlIDwO_Bg5UxFID82VqkPPppDnuOJlIjgkWGK0KmqikK--TiVkxB7NGLiokNetkQ2x81RbcUFcAJBvwaU2wNKUDF6ONwdfNmquEJiJ4MY3N0xc8izfhhnYtpeZvIKaVgIGh6iUIc8aOSkXzf344hzAbPFQIIGjBmOZPPPNQB6BBGhx2GvdjpZ-_VYlzF-p5fHv9inR_9G_phcG2NlDN7O-dtks1xV5gn4q2XytFVAPwF4JpK6 priority: 102 providerName: Unpaywall  | 
    
| Title | Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT | 
    
| URI | https://link.springer.com/article/10.1186/s12859-021-04504-x https://www.ncbi.nlm.nih.gov/pubmed/34983371 https://www.proquest.com/docview/2620938921 https://www.proquest.com/docview/2616963049 https://pubmed.ncbi.nlm.nih.gov/PMC8729035 https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-021-04504-x https://doaj.org/article/38e7c141a7094506ad1a87c2f9ad4512  | 
    
| UnpaywallVersion | publishedVersion | 
    
| Volume | 23 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVADU databaseName: BioMedCentral customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: KQ8 dateStart: 20000101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: KQ8 dateStart: 20000701 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate - TFS customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: ABDBF dateStart: 20000101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: ADMLS dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DIK dateStart: 20000101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: GX1 dateStart: 0 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M~E dateStart: 20000101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RPM dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 7X7 dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: BENPR dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 8FG dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1471-2105 dateEnd: 20250131 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M48 dateStart: 20000701 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal – providerCode: PRVAVX databaseName: Springer Nature HAS Fully OA customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: AAJSJ dateStart: 20001201 isFulltext: true titleUrlDefault: https://www.springernature.com providerName: Springer Nature – providerCode: PRVAVX databaseName: Springer Nature OA Free Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: C6C dateStart: 20000112 isFulltext: true titleUrlDefault: http://www.springeropen.com/ providerName: Springer Nature  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1db9Mw0NqHEPCA-KYwKoMQPDBD3KRx8oBQW7WMilVTt0rdk-U69phU0q5JRfvIP-cuSbMFpmkvjRqfk9h3vg_7Pgh5p0GKuhbz2wlhmaeNz0LfCiY0yDJuQsuzWJjDgX8w8vrj5niLbNxtiwlMrjXtsJ7UaDH9tLpYf4UF_yVb8IH_OeGYhY2hswEoKI7HVu_nFwwLS-EBbFFlY5vsgvAKsbrDoXd50IAp_bMAJMEZWD_NTVzNtY-tyK4sxf__jPyKJPvXy7I8ar1P7i7juVr_VtPpFWnWe0geFGoobeV084hsmfgxuZMXplw_IX9-oIM4SwCBhmapHM5jVlzpfJakLEUZNy12EumvWYQuR9lfCux-kYdLUNzlpRGqqHFKk-UcGRNuz1EVRxQscZuXNKXwHjTbQful8A3t7vDkKRn1uiedA1YUa2Da93jKhAXbSTV9FYWO1pHjKu6A1c6difVEI9CgyoQTX2trrRdxq93IjXzTcJTrQUej3GdkJ57F5gWhE-OKSNuAK3i0MNEkaChHqYYJAi0mtlkjfIMGqYtM5lhQYyoziybwZY46CaiTGerkqkY-ln3meR6PG6HbiN0SEnNwZzdmizNZLGnpBkZo7nElwERuOjByrgKgcRuqyAM9qkbeIm1IzLIRoxvPmVomifx-PJQtH-soonNhjXwogOwMkaOKqAiYCUzMVYHcq0ACG9DV5g0Jys0qklhtIASVtMFr5E3ZjD3RtS42syXCcB-4MFiKNfI8p9hy3K4XBq4roLeo0HJlYqot8fnPLEl5AFab4wKu9jdUf_lZN038frkyboGnl7cY1Styr4EhK7ht5u2RnXSxNK9BkUwndbItxgJ-g963OtlttfrHfbi2u4OjIdzt-J16tkVTz1gGtIwGR63Tv0NOd-Y | 
    
| linkProvider | Scholars Portal | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR1Nb9Mw1BpDaHBAfBMYYBCIwxYtjtM4OSC0waaWdTuMTerNOI49JpWkNK22HvlD_Ebey9cWkCouO0WNn9PY7_l95X0Q8laDFOUW69sJYd1Am9CNQytcoUGWMRNbVubCHByG_ZPgy6g3WiG_m1wYDKtseGLJqNNco498CwunxyBdffZx8tPFrlH4dbVpoVGRxb5ZnIPJVnwYfAb8vvP9vd3jT3237irg6jBgM1dYUPJVL1Rp7GmdelwxD8xL5iU2EH6kQebGSai1tTZImdU85WlofE_xACYaxeG5N8jNgAMvgfMjRq2Bx7A_QJOYE4VbBcPqcC4GQYDi5AXuRUf4lT0C_pUEV0Th32Ga7bfaO2Rtnk3U4lyNx1fE4d49crfWY-l2RXj3yYrJHpBbVWfLxUPya4gR5m4BFGBoWQviLHPrK53kxcydoZAc165I-iNPMWap_ElBXkyrfAuKbmKaoo6bzWgxnyBnQ_8eVVlKwZS3VU9UCv-Ddj-ozxTeYWf36PgRObkWvDwmq1memaeEJoaLVNuIKXi0MGkS-cpTyjdRpEView5hDRqkrkuhY0eOsSxNoiiUFeokoE6WqJMXDtlo50yqQiBLoXcQuy0kFvEub-TTU1nzBMkjIzQLmBJgY_c8WDlTERwSG6s0AEXMIW-QNiSW6cgwDuhUzYtCDr4eye0QGzFidKJD3tdANkfkqDqtAnYCK3t1INc7kMBHdHe4IUFZ87FCXp46h7xuh3EmxuZlJp8jDAuBjYOp6ZAnFcW26-ZBHHEuYLbo0HJnY7oj2dn3ssp5BGafxwFXmw3VX77Wso3fbE_Gf-Dp2fJFvyJr_eODoRwODvefk9s-pr6g-y1YJ6uz6dy8AIV0lrwsuQAl366b7fwBNCKemw | 
    
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwELfQEF8PiG8KAwxC4oFFi2PXTh63smmDMaGxSXuzHH-MSSWpmlSwR_5z7pI0NIAmeKpan9PEd74P5-53hLy2YEV5QHw7pUIkrJdRJoOKlAVbxnwWWFML8_FQ7p2I96fj05Uq_ibbfflKsq1pQJSmot6cudBu8VRuVgxx1yJMLwCXJBYReJFXBVg37GEwkZP-PQIi9i9LZf46b2COGtT-P3XzinH6PXGyf3t6i9xYFDNz8c1MpysGavcOud15lnSrFYW75Iov7pFrba_Ji_vkxwHmfEcV8MTTBp3hvIi6Tzorqzqq0WxNu8NB-rV0mEXUfKWgwedtBQTFg1vq0OssalotZqhr8MSNmsJRWMXQdiml8D8YiYNDS-EetneOjh-Qk92d48le1PVfiKwUrI5UgHDIjKVxWWyti7lhMQTiLM6DUElqwTvJcmltCEE4Fix33EmfxIYLmOgNf0jWirLwjwnNPVfOhpQZuLTyLk8TExuT-DS1Kg_jEWFLNmjbgZNjj4ypboKUVOqWdRpYpxvW6e8j8rafM2uhOS6l3kbu9pQIq938UM7PdLdLNU-9skwwoyDqHcfw5MykILYhM06AazQir1A2NAJnFJiZc2YWVaX3Px_pLYmtETFfcETedEShROaYrtABVgKxtgaU6wNK2Nl2OLwUQd1plkpjA4EMvMyEjcjLfhhnYrZc4csF0jAJihWCvxF51Eps_9xcZCnnCmargSwPFmY4Upx_aXDHUwjEYg682lhK_a_bumzhN_qd8Q98evJ_V39Brn96t6sP9g8_PCU3E6xNwfMxsU7W6vnCPwOPsc6fN0rhJ7nEZTA | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwELemTQh44PsjMJBBSDywdHGdxMljhzYNBBMqqzSeLMexR0WXVE0iGG_859zli2agCSSeqtbnNj77vty73xHyQoMV5Rbx7YSwrq9N6MahFa7QYMuYiS2ra2HeH4WHM__tSXCyQaZdLUxyppN53oKGIlDxaL0MfdFUOWAXBbPaXaa2Efoo3C0YIrG5mHAATornu-BXboUB-OebZGt29GHyqS4zEsyFGCfoqmf-OHFgoWog_9_V9Zq9uphL2f-hep1crbKlOv-qFos1m3VwkxTdaptUlS-jqkxG-vsFIMj_y45b5Ebr4tJJcyZvkw2T3SFXmqaX53fJj3eYfO4WcDgMrWEi5pnbvtJlXpRuifZz0d5S0rM8xXSm-i0FU7JqSjEo3iDTFN3frKRFtUSlh1d_VGUphSjfNu1SKfwOXgmAZ03hGfb2p8f3yOxg__j1ods2gnB16LPSFRbiMhWEKo09rVOPK-aZBDYysb4YRxrcpDgJtbbW-imzmqc8Dc3YU9yHiUbx-2QzyzPzkNDEcJFqGzEFXy1MmkRj5Sk1NlGkRWIDh7Bu86VuUdKxWcdC1tFSFMqGtRJYK2vWym8OedXPWTYYIZdS7-GZ6ikR37v-IF-dylZdSB4ZoZnPlIDwO_Bg5UxFID82VqkPPppDnuOJlIjgkWGK0KmqikK--TiVkxB7NGLiokNetkQ2x81RbcUFcAJBvwaU2wNKUDF6ONwdfNmquEJiJ4MY3N0xc8izfhhnYtpeZvIKaVgIGh6iUIc8aOSkXzf344hzAbPFQIIGjBmOZPPPNQB6BBGhx2GvdjpZ-_VYlzF-p5fHv9inR_9G_phcG2NlDN7O-dtks1xV5gn4q2XytFVAPwF4JpK6 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-scale+protein-protein+post-translational+modification+extraction+with+distant+supervision+and+confidence+calibrated+BioBERT&rft.jtitle=BMC+bioinformatics&rft.au=Elangovan%2C+Aparna&rft.au=Li%2C+Yuan&rft.au=Pires%2C+Douglas+E+V&rft.au=Davis%2C+Melissa+J&rft.date=2022-01-04&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=23&rft.issue=1&rft.spage=4&rft_id=info:doi/10.1186%2Fs12859-021-04504-x&rft.externalDBID=NO_FULL_TEXT | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon |