Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated wit...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 23; no. 1; pp. 4 - 23
Main Authors	Elangovan, Aparna, Li, Yuan, Pires, Douglas E. V., Davis, Melissa J., Verspoor, Karin
Format	Journal Article
Language	English
Published	London BioMed Central 04.01.2022 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Algorithms Analysis Annotations Automation BioBERT Bioinformatics Biomedical and Life Sciences Calibration Computational Biology/Bioinformatics Computational linguistics Computer Appl. in Life Sciences Data Mining Datasets Deep learning Distant supervision Enzymes Humans Language processing Life Sciences Machine learning Methods Microarrays Natural language interfaces Natural language processing Noise Phosphorylation Post-translation Post-translational modification Post-translational modifications Predictions Protein interaction Protein Processing, Post-Translational Protein-protein interaction Protein-protein interactions Proteins PubMed Supervision Test sets Translation Uniqueness Australia Deep learning Distant supervision Natural language processing Post-translational modifications Protein-protein interaction BioBERT
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-021-04504-x

Cover

Abstract	Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
AbstractList	Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Motivation We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models--dubbed PPI-BioBERT-x10--to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [formula omitted] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models--dubbed PPI-BioBERT-x10--to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [formula omitted] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Keywords: Protein-protein interaction, Post-translational modifications, BioBERT, Natural language processing, Deep learning, Distant supervision Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter $$\approx 5700$$ ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter $\approx 5700$ (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.MOTIVATIONProtein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.METHODWe use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.RESULTS AND CONCLUSIONThe PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
ArticleNumber	4
Audience	Academic
Author	Li, Yuan Pires, Douglas E. V. Elangovan, Aparna Verspoor, Karin Davis, Melissa J.
Author_xml	– sequence: 1 givenname: Aparna surname: Elangovan fullname: Elangovan, Aparna organization: School of Computing and Information Systems, The University of Melbourne – sequence: 2 givenname: Yuan surname: Li fullname: Li, Yuan organization: School of Computing and Information Systems, The University of Melbourne – sequence: 3 givenname: Douglas E. V. surname: Pires fullname: Pires, Douglas E. V. organization: School of Computing and Information Systems, The University of Melbourne – sequence: 4 givenname: Melissa J. surname: Davis fullname: Davis, Melissa J. organization: The Walter and Eliza Hall Institute of Medical Research, Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne – sequence: 5 givenname: Karin surname: Verspoor fullname: Verspoor, Karin email: karin.verspoor@rmit.edu.au organization: School of Computing and Information Systems, The University of Melbourne, School of Computing Technologies, RMIT University
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/34983371$$D View this record in MEDLINE/PubMed
BookMark	eNqNUsluFDEUbKEgssAPcEAtcYFDB2-9-IKURAFGGgkphLP1xkvHUY892O5kcuTP8SwkmQhFyAcvr6psV73DYs95p4viLUbHGHfNp4hJV_MKEVwhViNWLV8UB5i1uCIY1XuP1vvFYYzXCOG2Q_WrYp8y3lHa4oPi9xRCr6soYdDlIvikrau2c7nwMVUpgIsDJOsdDOXcK2usXG9LvcxFuV7e2nRVKhsTuFTGcaHDjY2rAjhVSu-MVdpJXeZ77CxA0qo8tf70_OLydfHSwBD1m-18VPz8cn559q2afv86OTuZVrJhOFWtYZxD3YDiSEqFKGCkZ_lzM8Na0knUYj5rpDTGMIWNpIqqRhMElGWiBnpUTDa6ysO1WAQ7h3AnPFixPvChFxCSlYMWtNOtxAxDi3g2Nt-JoWslMRwUqzHJWnSjNboF3N3CMNwLYiRW4YhNOCKHI9bhiGVmfd6wFuNsrpXULts37Dxlt-Lslej9jehawhGts8CHrUDwv0Ydk5jbKPUwgNN-jII0uOENRYxn6Psn0Gs_hpzgCkUQpx0n-AHV5_iFdcavAl2JipOG04aThnUZdfwPVB5Kz23OVhubz3cIH3cIGZNyr_QwxigmPy52se8em3Lvxt8WzYBuA5DBxxi0EdKmdf_lV9jhecPJE-p_pbTNNmaw63V4cO4Z1h_e3hxW
CitedBy_id	crossref_primary_10_3390_jpm14121157 crossref_primary_10_1186_s13643_024_02470_y crossref_primary_10_1038_s41568_024_00784_6 crossref_primary_10_1007_s44163_024_00197_2 crossref_primary_10_1016_j_mcpro_2023_100682
Cites_doi	10.1093/database/bav009 10.1186/1471-2105-8-50 10.1093/bioinformatics/btz682 10.18653/v1/W17-2323 10.1093/nar/gkj141 10.1093/nar/gku1267 10.18653/v1/2020.blackboxnlp-1.21 10.1109/TCBB.2014.2372765 10.1016/S2589-7500(20)30186-2 10.1109/ACCESS.2019.2927253 10.1093/nar/gku1055 10.18653/v1/N19-1423 10.1093/nar/gks1094 10.1093/database/bav020 10.1093/database/bax040 10.1093/nar/gkx1104 10.1016/j.artmed.2004.07.016 10.1016/j.ijhcs.2019.05.008 10.1016/j.knosys.2018.11.020 10.1093/nar/gkt1115 10.1155/2015/918710 10.1093/nar/gky1131 10.1007/s10994-021-05946-3 10.1093/database/bay122 10.3115/1690219.1690287 10.1038/nmeth.1931 10.18653/v1/2021.eacl-main.113 10.18653/v1/W17-2304 10.1093/nar/gky1049
ContentType	Journal Article
Copyright	The Author(s) 2021 2021. The Author(s). COPYRIGHT 2022 BioMed Central Ltd. 2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: The Author(s) 2021 – notice: 2021. The Author(s). – notice: COPYRIGHT 2022 BioMed Central Ltd. – notice: 2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM ISR 3V. 7QO 7SC 7X7 7XB 88E 8AL 8AO 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABUWG AEUYN AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ JQ2 K7- K9. L7M LK8 L~C L~D M0N M0S M1P M7P P5Z P62 P64 PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI Q9U 7X8 5PM ADTOC UNPAY DOA
DOI	10.1186/s12859-021-04504-x
DatabaseName	Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Science ProQuest Central (Corporate) Biotechnology Research Abstracts Computer and Information Systems Abstracts Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Journals Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials Biological Science Collection ProQuest Central ProQuest Technology Collection Natural Science Collection ProQuest One Community College ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection (Proquest) ProQuest Computer Science Collection Computer Science Database ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Biological Sciences Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database Health & Medical Collection (Alumni Edition) Medical Database ProQuest Biological Science Database Advanced Technologies & Aerospace Collection ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database (Proquest) ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Engineering Research Database ProQuest One Academic ProQuest One Academic (New) Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central ProQuest Health & Medical Research Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Advanced Technologies Database with Aerospace ProQuest Computing ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest Medical Library ProQuest Central (Alumni) MEDLINE - Academic
DatabaseTitleList	Publicly Available Content Database MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 5 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 6 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1471-2105
EndPage	23
ExternalDocumentID	oai_doaj_org_article_38e7c141a7094506ad1a87c2f9ad4512 10.1186/s12859-021-04504-x PMC8729035 A693692648 34983371 10_1186_s12859_021_04504_x
Genre	Journal Article
GeographicLocations	Australia
GeographicLocations_xml	– name: Australia
GroupedDBID	--- 0R~ 23N 2WC 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADMLS ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO ICD IHR INH INR ISR ITC K6V K7- KQ8 LK8 M1P M48 M7P MK~ ML0 M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO RBZ RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB AAYXX CITATION ALIPV CGR CUY CVF ECM EIF NPM 3V. 7QO 7SC 7XB 8AL 8FD 8FK FR3 JQ2 K9. L7M L~C L~D M0N P64 PKEHL PQEST PQUKI Q9U 7X8 5PM 123 2VQ 4.4 ADRAZ ADTOC AHSBF C1A EJD H13 IPNFZ RIG UNPAY
ID	FETCH-LOGICAL-c641t-7f499a56ad90ccd03a10eb210bf4728c0719b6ccfff4d1fc3d3d6e20a34499ea3
IEDL.DBID	M48
ISSN	1471-2105
IngestDate	Fri Oct 03 12:50:29 EDT 2025 Sun Oct 26 03:34:24 EDT 2025 Tue Sep 30 16:37:31 EDT 2025 Sun Aug 24 04:13:01 EDT 2025 Mon Oct 06 18:39:27 EDT 2025 Mon Oct 20 22:02:25 EDT 2025 Mon Oct 20 16:33:37 EDT 2025 Thu Oct 16 14:42:52 EDT 2025 Mon Jul 21 06:06:02 EDT 2025 Wed Oct 01 04:15:38 EDT 2025 Thu Apr 24 23:11:50 EDT 2025 Sat Sep 06 07:27:22 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Deep learning Distant supervision Natural language processing Post-translational modifications Protein-protein interaction BioBERT
Language	English
License	2021. The Author(s). Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c641t-7f499a56ad90ccd03a10eb210bf4728c0719b6ccfff4d1fc3d3d6e20a34499ea3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-021-04504-x
PMID	34983371
PQID	2620938921
PQPubID	44065
PageCount	23
ParticipantIDs	doaj_primary_oai_doaj_org_article_38e7c141a7094506ad1a87c2f9ad4512 unpaywall_primary_10_1186_s12859_021_04504_x pubmedcentral_primary_oai_pubmedcentral_nih_gov_8729035 proquest_miscellaneous_2616963049 proquest_journals_2620938921 gale_infotracmisc_A693692648 gale_infotracacademiconefile_A693692648 gale_incontextgauss_ISR_A693692648 pubmed_primary_34983371 crossref_citationtrail_10_1186_s12859_021_04504_x crossref_primary_10_1186_s12859_021_04504_x springer_journals_10_1186_s12859_021_04504_x
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2022-01-04
PublicationDateYYYYMMDD	2022-01-04
PublicationDate_xml	– month: 01 year: 2022 text: 2022-01-04 day: 04
PublicationDecade	2020
PublicationPlace	London
PublicationPlace_xml	– name: London – name: England
PublicationTitle	BMC bioinformatics
PublicationTitleAbbrev	BMC Bioinformatics
PublicationTitleAlternate	BMC Bioinformatics
PublicationYear	2022
Publisher	BioMed Central BioMed Central Ltd Springer Nature B.V BMC
Publisher_xml	– name: BioMed Central – name: BioMed Central Ltd – name: Springer Nature B.V – name: BMC
References	4504_CR30 H Zhang (4504_CR26) 2019; 7 TU Consortium (4504_CR5) 2018; 47 R Bunescu (4504_CR11) 2005; 33 4504_CR14 R Raisamo (4504_CR37) 2019; 131 4504_CR13 H Huang (4504_CR20) 2017; 46 4504_CR35 4504_CR32 J Futoma (4504_CR24) 2020; 2 4504_CR33 S Orchard (4504_CR3) 2013; 42 D Szklarczyk (4504_CR18) 2018; 47 S Orchard (4504_CR27) 2012; 9 S Orchard (4504_CR1) 2013; 42 S Orchard (4504_CR4) 2015 E Hüllermeier (4504_CR34) 2021; 110 GR Brown (4504_CR29) 2014; 43 S Pyysalo (4504_CR12) 2007; 8 M Torii (4504_CR17) 2015; 12 4504_CR9 4504_CR28 4504_CR25 N Srivastava (4504_CR31) 2014; 15 L Mottin (4504_CR36) 2017 4504_CR23 4504_CR22 J Lee (4504_CR10) 2019; 36 4504_CR6 4504_CR7 4504_CR8 Q Chen (4504_CR19) 2018 A Franceschini (4504_CR15) 2012; 41 CO Tudor (4504_CR16) 2015 PV Hornbeck (4504_CR21) 2014; 43 GR Mishra (4504_CR2) 2006; 34
References_xml	– year: 2015 ident: 4504_CR4 publication-title: Database doi: 10.1093/database/bav009 – volume: 8 start-page: 50 issue: 1 year: 2007 ident: 4504_CR12 publication-title: BMC Bioinf doi: 10.1186/1471-2105-8-50 – ident: 4504_CR6 – volume: 36 start-page: 1234 issue: 4 year: 2019 ident: 4504_CR10 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz682 – ident: 4504_CR25 doi: 10.18653/v1/W17-2323 – volume: 34 start-page: 411 year: 2006 ident: 4504_CR2 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkj141 – volume: 43 start-page: 512 issue: D1 year: 2014 ident: 4504_CR21 publication-title: Nucleic Acids Res doi: 10.1093/nar/gku1267 – ident: 4504_CR35 – ident: 4504_CR33 doi: 10.18653/v1/2020.blackboxnlp-1.21 – volume: 12 start-page: 17 issue: 1 year: 2015 ident: 4504_CR17 publication-title: IEEE/ACM Trans Comput Biol Bioinf doi: 10.1109/TCBB.2014.2372765 – volume: 2 start-page: 489 issue: 9 year: 2020 ident: 4504_CR24 publication-title: Lancet Digital Health doi: 10.1016/S2589-7500(20)30186-2 – volume: 7 start-page: 89354 year: 2019 ident: 4504_CR26 publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2927253 – volume: 43 start-page: 36 issue: D1 year: 2014 ident: 4504_CR29 publication-title: Nucleic Acids Res doi: 10.1093/nar/gku1055 – ident: 4504_CR30 doi: 10.18653/v1/N19-1423 – ident: 4504_CR7 – ident: 4504_CR22 – volume: 15 start-page: 1929 issue: 56 year: 2014 ident: 4504_CR31 publication-title: J Mach Learn Res – volume: 41 start-page: 808 issue: D1 year: 2012 ident: 4504_CR15 publication-title: Nucleic Acids Res doi: 10.1093/nar/gks1094 – year: 2015 ident: 4504_CR16 publication-title: Database doi: 10.1093/database/bav020 – year: 2017 ident: 4504_CR36 publication-title: Database doi: 10.1093/database/bax040 – ident: 4504_CR13 – volume: 46 start-page: 542 issue: D1 year: 2017 ident: 4504_CR20 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkx1104 – volume: 33 start-page: 139 issue: 2 year: 2005 ident: 4504_CR11 publication-title: Artif Intell Med doi: 10.1016/j.artmed.2004.07.016 – volume: 131 start-page: 131 year: 2019 ident: 4504_CR37 publication-title: Int J Human-Comput Stud. doi: 10.1016/j.ijhcs.2019.05.008 – ident: 4504_CR32 doi: 10.1016/j.knosys.2018.11.020 – volume: 42 start-page: 358 issue: (D1) year: 2013 ident: 4504_CR3 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkt1115 – ident: 4504_CR28 doi: 10.1155/2015/918710 – volume: 47 start-page: 607 issue: D1 year: 2018 ident: 4504_CR18 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gky1131 – volume: 110 start-page: 457 issue: 3 year: 2021 ident: 4504_CR34 publication-title: Mach Learn doi: 10.1007/s10994-021-05946-3 – year: 2018 ident: 4504_CR19 publication-title: Database doi: 10.1093/database/bay122 – ident: 4504_CR9 doi: 10.3115/1690219.1690287 – volume: 9 start-page: 345 issue: 4 year: 2012 ident: 4504_CR27 publication-title: Nature Methods doi: 10.1038/nmeth.1931 – ident: 4504_CR8 – ident: 4504_CR23 doi: 10.18653/v1/2021.eacl-main.113 – ident: 4504_CR14 doi: 10.18653/v1/W17-2304 – volume: 42 start-page: 358 issue: D1 year: 2013 ident: 4504_CR1 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkt1115 – volume: 47 start-page: 506 issue: D1 year: 2018 ident: 4504_CR5 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gky1049
SSID	ssj0017805
Score	2.4519792
Snippet	Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions... Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are... Motivation We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and... Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions... Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein...
SourceID	doaj unpaywall pubmedcentral proquest gale pubmed crossref springer
SourceType	Open Website Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	4
SubjectTerms	Algorithms Analysis Annotations Automation BioBERT Bioinformatics Biomedical and Life Sciences Calibration Computational Biology/Bioinformatics Computational linguistics Computer Appl. in Life Sciences Data Mining Datasets Deep learning Distant supervision Enzymes Humans Language processing Life Sciences Machine learning Methods Microarrays Natural language interfaces Natural language processing Noise Phosphorylation Post-translation Post-translational modification Post-translational modifications Predictions Protein interaction Protein Processing, Post-Translational Protein-protein interaction Protein-protein interactions Proteins PubMed Supervision Test sets Translation Uniqueness
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELZQJQQcEG8CBRmExIFajWOv7Rxb1Kog4FBaqTfL8QMqLdlVkxXtkX_OTJINDUiFA6do43E2mZnMwxl_Q8grD15UJMS30zox6aNipUqaaQ--jMcy8W4vzMdP6uBYvj-ZnVxq9YU1YT08cM-4bWGi9lxypyERmeXKBe4MXCmVLshZ11-4yE25TqaG7weI1L_eImPUdsMRp41hOQKEMLlk5xM31KH1_2mTLzml3wsmx6-mt8iNVb10F9_dfH7JMe3fIbeHiJLu9E9yl1yL9T1yve8xeXGf_PiAtd6sAVlE2qEynNZsONLlomlZi-5qPiwK0m-LgNVD3U8Klvus3_lAccGWBow265Y2qyXaGFxpo64OFJLq1HcnpfA_mIFDIEvhHnb3Do8ekOP9vaO3B2zou8C8krxlOkEa5GbA6jL3PuTC8RwScJ5XSerCeIhKykp5n1KSgScvgggqFrkTEiZGJx6SjXpRx8eEVlHo4JPhDi6tY6hM4XLnimiM11WaZYSvxWD9AEqOvTHmtktOjLK96CyIznais-cZeTPOWfaQHFdS76J0R0qE0-5OgJLZQcns35QsIy9RNywCZtRYkfPFrZrGvvt8aHcUtkTEOsGMvB6I0gKF44YNDsAJxNiaUG5OKOGN9tPhtQrawaI0FhsHlBBdFjwjL8ZhnIlVcnVcrJCGKzCokPRl5FGvseNzC1kaITTM1hNdnjBmOlKffu3wxg0kYLkAWW2ttf7XbV3F-K3xzfgHOT35H3J6Sm4WuD0Fl8jkJtloz1bxGQSNbfW8sw8_AYCEaGs priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELdGJwQ8IL4GgYEMQuKBRYvjNE4eEFpRp4GgQmWT9mY5_hiTSlKaVGyP_Ofc5WsLSBVPVWu7iX3n-7DvfkfIaw1alDvEtxPC-ZG2sZ_GTvhCgy5jNnWszoX5MouPTqJPp-PTLTLrcmEwrLKTibWgNoXGM_J9BE5PQbuG7P3yp49Vo_B2tSuhodrSCuZdDTF2g2yHiIw1ItuT6ezrvL9XQAT_LnUmifdLhvhtPoYpgGkTRP7FQD3VKP7_yupryurvQMr-NvUOubXOl-ryl1osrimsw3vkbmtp0oOGNe6TLZs_IDeb2pOXD8nvzxgD7pdAI0trtIbz3G8_6bIoK79CNbZoDwvpj8JgVFH9lYJEXzUZERQPcqlBKzSvaLleouzBEziqckPB2XZN1VIKz0HPHAxcCu8wmc6PH5GTw-nxhyO_rcfg6zhilS8cuEdqHCuTBlqbgCsWgGPOgsxFIkw0WCtpFmvtnIsMc5obbmIbBopHMNAqvkNGeZHbJ4RmlgujXcIU_LWwJktCFSgV2iTRInNjj7CODFK3YOVYM2Mha6cliWVDOgmkkzXp5IVH3vZjlg1Ux8beE6Ru3xNhtusfitWZbHet5IkVmkVMCfCCxwHMnKkE2NilykRgKnnkFfKGRCCNHCN1ztS6LOXHb3N5EGOpRIwf9MibtpMrkDiqTXyAlUDsrUHP3UFP2Ol62NyxoGwlTSmv9oVHXvbNOBKj53JbrLEPi0HQgjPokccNx_bz5lGacC5gtBjw8mBhhi35-fcahzwBxyzgQKu9juuvXmvTwu_1O-M_6PR086SfkdshJqTgoVi0S0bVam2fg5lYZS_avf8HtcBnIg priority: 102 providerName: ProQuest – databaseName: Springer Nature OA Free Journals dbid: C6C link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZQEQIOiDeBggxC4kAt4thrO8e2alUQcCit1Jvl-AGVluyqyQp65J8zk2RDA6iC02rX42zimczDnvmGkJcerKhIiG-ndWLSR8VKlTTTHmwZj2XiXS3Mh4_q4Fi-O5mdDDA5WAtz8fyeG_Wm4YiwxjCRAJyPXDLwF6-CkVLdwazaHU8MEJt_XRTz13kTw9Ph8_-phS-Yod9TJMdz0pvk-qpeuvNvbj6_YIr2b5Nbgw9Jt3um3yFXYn2XXOu7Sp7fIz_eY3Y3a2D1I-1wGE5rNnzS5aJpWYsGaj5sA9Kvi4D5Qt1XCrr6rK91oLhFSwP6l3VLm9UStQrurVFXBwphdOr7kVL4H4y5wXWlcA87e4dH98nx_t7R7gEbOi0wryRvmU4Q-LiZcqHMvQ-5cDyHkJvnVZK6MB78kLJS3qeUZODJiyCCikXuhISJ0YkHZKNe1PERoVUUOvhkuINL6xgqU7jcuSIa43WVZhnhazZYP8CQYzeMue3CEaNszzoLrLMd6-z3jLwe5yx7EI5LqXeQuyMlAmh3P4Bc2eF9tMJE7bnkTkN8O8vhybkzIKCpdEGCfGXkBcqGRYiMGnNwPrtV09i3nw7ttsImiJgZmJFXA1FaIHPcUNIAK4GoWhPKzQklvMN-OrwWQTvokMZiq4AS_MmCZ-T5OIwzMS-ujosV0nAFKhTCvIw87CV2fG4hSyOEhtl6IsuThZmO1KdfOoRxAyFXLoBXW2up_3Vbly381vhm_AOfHv_f1Z-QGwWWnuD2l9wkG-3ZKj4Fh7CtnnWa4CcaeVn6 priority: 102 providerName: Springer Nature – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwELemTQh44PsjMJBBSDywdHGdxMljhzYNBBMqqzSeLMexR0WXVE0iGG_859zli2agCSSeqtbnNj77vty73xHyQoMV5Rbx7YSwrq9N6MahFa7QYMuYiS2ra2HeH4WHM__tSXCyQaZdLUxyppN53oKGIlDxaL0MfdFUOWAXBbPaXaa2Efoo3C0YIrG5mHAATornu-BXboUB-OebZGt29GHyqS4zEsyFGCfoqmf-OHFgoWog_9_V9Zq9uphL2f-hep1crbKlOv-qFos1m3VwkxTdaptUlS-jqkxG-vsFIMj_y45b5Ebr4tJJcyZvkw2T3SFXmqaX53fJj3eYfO4WcDgMrWEi5pnbvtJlXpRuifZz0d5S0rM8xXSm-i0FU7JqSjEo3iDTFN3frKRFtUSlh1d_VGUphSjfNu1SKfwOXgmAZ03hGfb2p8f3yOxg__j1ods2gnB16LPSFRbiMhWEKo09rVOPK-aZBDYysb4YRxrcpDgJtbbW-imzmqc8Dc3YU9yHiUbx-2QzyzPzkNDEcJFqGzEFXy1MmkRj5Sk1NlGkRWIDh7Bu86VuUdKxWcdC1tFSFMqGtRJYK2vWym8OedXPWTYYIZdS7-GZ6ikR37v-IF-dylZdSB4ZoZnPlIDwO_Bg5UxFID82VqkPPppDnuOJlIjgkWGK0KmqikK--TiVkxB7NGLiokNetkQ2x81RbcUFcAJBvwaU2wNKUDF6ONwdfNmquEJiJ4MY3N0xc8izfhhnYtpeZvIKaVgIGh6iUIc8aOSkXzf344hzAbPFQIIGjBmOZPPPNQB6BBGhx2GvdjpZ-_VYlzF-p5fHv9inR_9G_phcG2NlDN7O-dtks1xV5gn4q2XytFVAPwF4JpK6 priority: 102 providerName: Unpaywall
Title	Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
URI	https://link.springer.com/article/10.1186/s12859-021-04504-x https://www.ncbi.nlm.nih.gov/pubmed/34983371 https://www.proquest.com/docview/2620938921 https://www.proquest.com/docview/2616963049 https://pubmed.ncbi.nlm.nih.gov/PMC8729035 https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-021-04504-x https://doaj.org/article/38e7c141a7094506ad1a87c2f9ad4512
UnpaywallVersion	publishedVersion
Volume	23
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVADU databaseName: BioMedCentral customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: KQ8 dateStart: 20000101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: KQ8 dateStart: 20000701 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate - TFS customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: ABDBF dateStart: 20000101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: ADMLS dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DIK dateStart: 20000101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: GX1 dateStart: 0 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M~E dateStart: 20000101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RPM dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 7X7 dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: BENPR dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 8FG dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1471-2105 dateEnd: 20250131 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M48 dateStart: 20000701 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal – providerCode: PRVAVX databaseName: Springer Nature HAS Fully OA customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: AAJSJ dateStart: 20001201 isFulltext: true titleUrlDefault: https://www.springernature.com providerName: Springer Nature – providerCode: PRVAVX databaseName: Springer Nature OA Free Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: C6C dateStart: 20000112 isFulltext: true titleUrlDefault: http://www.springeropen.com/ providerName: Springer Nature
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1db9Mw0NqHEPCA-KYwKoMQPDBD3KRx8oBQW7WMilVTt0rdk-U69phU0q5JRfvIP-cuSbMFpmkvjRqfk9h3vg_7Pgh5p0GKuhbz2wlhmaeNz0LfCiY0yDJuQsuzWJjDgX8w8vrj5niLbNxtiwlMrjXtsJ7UaDH9tLpYf4UF_yVb8IH_OeGYhY2hswEoKI7HVu_nFwwLS-EBbFFlY5vsgvAKsbrDoXd50IAp_bMAJMEZWD_NTVzNtY-tyK4sxf__jPyKJPvXy7I8ar1P7i7juVr_VtPpFWnWe0geFGoobeV084hsmfgxuZMXplw_IX9-oIM4SwCBhmapHM5jVlzpfJakLEUZNy12EumvWYQuR9lfCux-kYdLUNzlpRGqqHFKk-UcGRNuz1EVRxQscZuXNKXwHjTbQful8A3t7vDkKRn1uiedA1YUa2Da93jKhAXbSTV9FYWO1pHjKu6A1c6difVEI9CgyoQTX2trrRdxq93IjXzTcJTrQUej3GdkJ57F5gWhE-OKSNuAK3i0MNEkaChHqYYJAi0mtlkjfIMGqYtM5lhQYyoziybwZY46CaiTGerkqkY-ln3meR6PG6HbiN0SEnNwZzdmizNZLGnpBkZo7nElwERuOjByrgKgcRuqyAM9qkbeIm1IzLIRoxvPmVomifx-PJQtH-soonNhjXwogOwMkaOKqAiYCUzMVYHcq0ACG9DV5g0Jys0qklhtIASVtMFr5E3ZjD3RtS42syXCcB-4MFiKNfI8p9hy3K4XBq4roLeo0HJlYqot8fnPLEl5AFab4wKu9jdUf_lZN038frkyboGnl7cY1Styr4EhK7ht5u2RnXSxNK9BkUwndbItxgJ-g963OtlttfrHfbi2u4OjIdzt-J16tkVTz1gGtIwGR63Tv0NOd-Y
linkProvider	Scholars Portal
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR1Nb9Mw1BpDaHBAfBMYYBCIwxYtjtM4OSC0waaWdTuMTerNOI49JpWkNK22HvlD_Ebey9cWkCouO0WNn9PY7_l95X0Q8laDFOUW69sJYd1Am9CNQytcoUGWMRNbVubCHByG_ZPgy6g3WiG_m1wYDKtseGLJqNNco498CwunxyBdffZx8tPFrlH4dbVpoVGRxb5ZnIPJVnwYfAb8vvP9vd3jT3237irg6jBgM1dYUPJVL1Rp7GmdelwxD8xL5iU2EH6kQebGSai1tTZImdU85WlofE_xACYaxeG5N8jNgAMvgfMjRq2Bx7A_QJOYE4VbBcPqcC4GQYDi5AXuRUf4lT0C_pUEV0Th32Ga7bfaO2Rtnk3U4lyNx1fE4d49crfWY-l2RXj3yYrJHpBbVWfLxUPya4gR5m4BFGBoWQviLHPrK53kxcydoZAc165I-iNPMWap_ElBXkyrfAuKbmKaoo6bzWgxnyBnQ_8eVVlKwZS3VU9UCv-Ddj-ozxTeYWf36PgRObkWvDwmq1memaeEJoaLVNuIKXi0MGkS-cpTyjdRpEView5hDRqkrkuhY0eOsSxNoiiUFeokoE6WqJMXDtlo50yqQiBLoXcQuy0kFvEub-TTU1nzBMkjIzQLmBJgY_c8WDlTERwSG6s0AEXMIW-QNiSW6cgwDuhUzYtCDr4eye0QGzFidKJD3tdANkfkqDqtAnYCK3t1INc7kMBHdHe4IUFZ87FCXp46h7xuh3EmxuZlJp8jDAuBjYOp6ZAnFcW26-ZBHHEuYLbo0HJnY7oj2dn3ssp5BGafxwFXmw3VX77Wso3fbE_Gf-Dp2fJFvyJr_eODoRwODvefk9s-pr6g-y1YJ6uz6dy8AIV0lrwsuQAl366b7fwBNCKemw
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwELfQEF8PiG8KAwxC4oFFi2PXTh63smmDMaGxSXuzHH-MSSWpmlSwR_5z7pI0NIAmeKpan9PEd74P5-53hLy2YEV5QHw7pUIkrJdRJoOKlAVbxnwWWFML8_FQ7p2I96fj05Uq_ibbfflKsq1pQJSmot6cudBu8VRuVgxx1yJMLwCXJBYReJFXBVg37GEwkZP-PQIi9i9LZf46b2COGtT-P3XzinH6PXGyf3t6i9xYFDNz8c1MpysGavcOud15lnSrFYW75Iov7pFrba_Ji_vkxwHmfEcV8MTTBp3hvIi6Tzorqzqq0WxNu8NB-rV0mEXUfKWgwedtBQTFg1vq0OssalotZqhr8MSNmsJRWMXQdiml8D8YiYNDS-EetneOjh-Qk92d48le1PVfiKwUrI5UgHDIjKVxWWyti7lhMQTiLM6DUElqwTvJcmltCEE4Fix33EmfxIYLmOgNf0jWirLwjwnNPVfOhpQZuLTyLk8TExuT-DS1Kg_jEWFLNmjbgZNjj4ypboKUVOqWdRpYpxvW6e8j8rafM2uhOS6l3kbu9pQIq938UM7PdLdLNU-9skwwoyDqHcfw5MykILYhM06AazQir1A2NAJnFJiZc2YWVaX3Px_pLYmtETFfcETedEShROaYrtABVgKxtgaU6wNK2Nl2OLwUQd1plkpjA4EMvMyEjcjLfhhnYrZc4csF0jAJihWCvxF51Eps_9xcZCnnCmargSwPFmY4Upx_aXDHUwjEYg682lhK_a_bumzhN_qd8Q98evJ_V39Brn96t6sP9g8_PCU3E6xNwfMxsU7W6vnCPwOPsc6fN0rhJ7nEZTA
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwELemTQh44PsjMJBBSDywdHGdxMljhzYNBBMqqzSeLMexR0WXVE0iGG_859zli2agCSSeqtbnNj77vty73xHyQoMV5Rbx7YSwrq9N6MahFa7QYMuYiS2ra2HeH4WHM__tSXCyQaZdLUxyppN53oKGIlDxaL0MfdFUOWAXBbPaXaa2Efoo3C0YIrG5mHAATornu-BXboUB-OebZGt29GHyqS4zEsyFGCfoqmf-OHFgoWog_9_V9Zq9uphL2f-hep1crbKlOv-qFos1m3VwkxTdaptUlS-jqkxG-vsFIMj_y45b5Ebr4tJJcyZvkw2T3SFXmqaX53fJj3eYfO4WcDgMrWEi5pnbvtJlXpRuifZz0d5S0rM8xXSm-i0FU7JqSjEo3iDTFN3frKRFtUSlh1d_VGUphSjfNu1SKfwOXgmAZ03hGfb2p8f3yOxg__j1ods2gnB16LPSFRbiMhWEKo09rVOPK-aZBDYysb4YRxrcpDgJtbbW-imzmqc8Dc3YU9yHiUbx-2QzyzPzkNDEcJFqGzEFXy1MmkRj5Sk1NlGkRWIDh7Bu86VuUdKxWcdC1tFSFMqGtRJYK2vWym8OedXPWTYYIZdS7-GZ6ikR37v-IF-dylZdSB4ZoZnPlIDwO_Bg5UxFID82VqkPPppDnuOJlIjgkWGK0KmqikK--TiVkxB7NGLiokNetkQ2x81RbcUFcAJBvwaU2wNKUDF6ONwdfNmquEJiJ4MY3N0xc8izfhhnYtpeZvIKaVgIGh6iUIc8aOSkXzf344hzAbPFQIIGjBmOZPPPNQB6BBGhx2GvdjpZ-_VYlzF-p5fHv9inR_9G_phcG2NlDN7O-dtks1xV5gn4q2XytFVAPwF4JpK6
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-scale+protein-protein+post-translational+modification+extraction+with+distant+supervision+and+confidence+calibrated+BioBERT&rft.jtitle=BMC+bioinformatics&rft.au=Elangovan%2C+Aparna&rft.au=Li%2C+Yuan&rft.au=Pires%2C+Douglas+E+V&rft.au=Davis%2C+Melissa+J&rft.date=2022-01-04&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=23&rft.issue=1&rft.spage=4&rft_id=info:doi/10.1186%2Fs12859-021-04504-x&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon