Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated wit...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 23; no. 1; pp. 4 - 23
Main Authors Elangovan, Aparna, Li, Yuan, Pires, Douglas E. V., Davis, Melissa J., Verspoor, Karin
Format Journal Article
LanguageEnglish
Published London BioMed Central 04.01.2022
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects
Online AccessGet full text
ISSN1471-2105
1471-2105
DOI10.1186/s12859-021-04504-x

Cover

Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
AbstractList Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Motivation We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models--dubbed PPI-BioBERT-x10--to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [formula omitted] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models--dubbed PPI-BioBERT-x10--to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [formula omitted] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts. Keywords: Protein-protein interaction, Post-translational modifications, BioBERT, Natural language processing, Deep learning, Distant supervision
Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter $$\approx 5700$$ ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter \(\approx 5700\) (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.MOTIVATIONProtein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.METHODWe use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.RESULTS AND CONCLUSIONThe PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
ArticleNumber 4
Audience Academic
Author Li, Yuan
Pires, Douglas E. V.
Elangovan, Aparna
Verspoor, Karin
Davis, Melissa J.
Author_xml – sequence: 1
  givenname: Aparna
  surname: Elangovan
  fullname: Elangovan, Aparna
  organization: School of Computing and Information Systems, The University of Melbourne
– sequence: 2
  givenname: Yuan
  surname: Li
  fullname: Li, Yuan
  organization: School of Computing and Information Systems, The University of Melbourne
– sequence: 3
  givenname: Douglas E. V.
  surname: Pires
  fullname: Pires, Douglas E. V.
  organization: School of Computing and Information Systems, The University of Melbourne
– sequence: 4
  givenname: Melissa J.
  surname: Davis
  fullname: Davis, Melissa J.
  organization: The Walter and Eliza Hall Institute of Medical Research, Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne
– sequence: 5
  givenname: Karin
  surname: Verspoor
  fullname: Verspoor, Karin
  email: karin.verspoor@rmit.edu.au
  organization: School of Computing and Information Systems, The University of Melbourne, School of Computing Technologies, RMIT University
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34983371$$D View this record in MEDLINE/PubMed
BookMark eNqNUsluFDEUbKEgssAPcEAtcYFDB2-9-IKURAFGGgkphLP1xkvHUY892O5kcuTP8SwkmQhFyAcvr6psV73DYs95p4viLUbHGHfNp4hJV_MKEVwhViNWLV8UB5i1uCIY1XuP1vvFYYzXCOG2Q_WrYp8y3lHa4oPi9xRCr6soYdDlIvikrau2c7nwMVUpgIsDJOsdDOXcK2usXG9LvcxFuV7e2nRVKhsTuFTGcaHDjY2rAjhVSu-MVdpJXeZ77CxA0qo8tf70_OLydfHSwBD1m-18VPz8cn559q2afv86OTuZVrJhOFWtYZxD3YDiSEqFKGCkZ_lzM8Na0knUYj5rpDTGMIWNpIqqRhMElGWiBnpUTDa6ysO1WAQ7h3AnPFixPvChFxCSlYMWtNOtxAxDi3g2Nt-JoWslMRwUqzHJWnSjNboF3N3CMNwLYiRW4YhNOCKHI9bhiGVmfd6wFuNsrpXULts37Dxlt-Lslej9jehawhGts8CHrUDwv0Ydk5jbKPUwgNN-jII0uOENRYxn6Psn0Gs_hpzgCkUQpx0n-AHV5_iFdcavAl2JipOG04aThnUZdfwPVB5Kz23OVhubz3cIH3cIGZNyr_QwxigmPy52se8em3Lvxt8WzYBuA5DBxxi0EdKmdf_lV9jhecPJE-p_pbTNNmaw63V4cO4Z1h_e3hxW
CitedBy_id crossref_primary_10_3390_jpm14121157
crossref_primary_10_1186_s13643_024_02470_y
crossref_primary_10_1038_s41568_024_00784_6
crossref_primary_10_1007_s44163_024_00197_2
crossref_primary_10_1016_j_mcpro_2023_100682
Cites_doi 10.1093/database/bav009
10.1186/1471-2105-8-50
10.1093/bioinformatics/btz682
10.18653/v1/W17-2323
10.1093/nar/gkj141
10.1093/nar/gku1267
10.18653/v1/2020.blackboxnlp-1.21
10.1109/TCBB.2014.2372765
10.1016/S2589-7500(20)30186-2
10.1109/ACCESS.2019.2927253
10.1093/nar/gku1055
10.18653/v1/N19-1423
10.1093/nar/gks1094
10.1093/database/bav020
10.1093/database/bax040
10.1093/nar/gkx1104
10.1016/j.artmed.2004.07.016
10.1016/j.ijhcs.2019.05.008
10.1016/j.knosys.2018.11.020
10.1093/nar/gkt1115
10.1155/2015/918710
10.1093/nar/gky1131
10.1007/s10994-021-05946-3
10.1093/database/bay122
10.3115/1690219.1690287
10.1038/nmeth.1931
10.18653/v1/2021.eacl-main.113
10.18653/v1/W17-2304
10.1093/nar/gky1049
ContentType Journal Article
Copyright The Author(s) 2021
2021. The Author(s).
COPYRIGHT 2022 BioMed Central Ltd.
2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2021
– notice: 2021. The Author(s).
– notice: COPYRIGHT 2022 BioMed Central Ltd.
– notice: 2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ISR
3V.
7QO
7SC
7X7
7XB
88E
8AL
8AO
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABUWG
AEUYN
AFKRA
ARAPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
CCPQU
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
HCIFZ
JQ2
K7-
K9.
L7M
LK8
L~C
L~D
M0N
M0S
M1P
M7P
P5Z
P62
P64
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
Q9U
7X8
5PM
ADTOC
UNPAY
DOA
DOI 10.1186/s12859-021-04504-x
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Science
ProQuest Central (Corporate)
Biotechnology Research Abstracts
Computer and Information Systems Abstracts
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Journals
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest One Sustainability
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
ProQuest Technology Collection
Natural Science Collection
ProQuest One Community College
ProQuest Central
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection (Proquest)
ProQuest Computer Science Collection
Computer Science Database
ProQuest Health & Medical Complete (Alumni)
Advanced Technologies Database with Aerospace
Biological Sciences
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
Health & Medical Collection (Alumni Edition)
Medical Database
ProQuest Biological Science Database
Advanced Technologies & Aerospace Collection
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database (Proquest)
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central Basic
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
SciTech Premium Collection
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Advanced Technologies & Aerospace Collection
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Advanced Technologies Database with Aerospace
ProQuest Computing
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest Medical Library
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList



Publicly Available Content Database

MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 4
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 5
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 6
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1471-2105
EndPage 23
ExternalDocumentID oai_doaj_org_article_38e7c141a7094506ad1a87c2f9ad4512
10.1186/s12859-021-04504-x
PMC8729035
A693692648
34983371
10_1186_s12859_021_04504_x
Genre Journal Article
GeographicLocations Australia
GeographicLocations_xml – name: Australia
GroupedDBID ---
0R~
23N
2WC
53G
5VS
6J9
7X7
88E
8AO
8FE
8FG
8FH
8FI
8FJ
AAFWJ
AAJSJ
AAKPC
AASML
ABDBF
ABUWG
ACGFO
ACGFS
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADMLS
ADUKV
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHBYD
AHMBA
AHYZX
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIJS
ARAPS
AZQEC
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BGLVJ
BHPHI
BMC
BPHCQ
BVXVI
C6C
CCPQU
CS3
DIK
DU5
DWQXO
E3Z
EAD
EAP
EAS
EBD
EBLON
EBS
EMB
EMK
EMOBN
ESX
F5P
FYUFA
GNUQQ
GROUPED_DOAJ
GX1
HCIFZ
HMCUK
HYE
IAO
ICD
IHR
INH
INR
ISR
ITC
K6V
K7-
KQ8
LK8
M1P
M48
M7P
MK~
ML0
M~E
O5R
O5S
OK1
OVT
P2P
P62
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PUEGO
RBZ
RNS
ROL
RPM
RSV
SBL
SOJ
SV3
TR2
TUS
UKHRP
W2D
WOQ
WOW
XH6
XSB
AAYXX
CITATION
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7QO
7SC
7XB
8AL
8FD
8FK
FR3
JQ2
K9.
L7M
L~C
L~D
M0N
P64
PKEHL
PQEST
PQUKI
Q9U
7X8
5PM
123
2VQ
4.4
ADRAZ
ADTOC
AHSBF
C1A
EJD
H13
IPNFZ
RIG
UNPAY
ID FETCH-LOGICAL-c641t-7f499a56ad90ccd03a10eb210bf4728c0719b6ccfff4d1fc3d3d6e20a34499ea3
IEDL.DBID M48
ISSN 1471-2105
IngestDate Fri Oct 03 12:50:29 EDT 2025
Sun Oct 26 03:34:24 EDT 2025
Tue Sep 30 16:37:31 EDT 2025
Sun Aug 24 04:13:01 EDT 2025
Mon Oct 06 18:39:27 EDT 2025
Mon Oct 20 22:02:25 EDT 2025
Mon Oct 20 16:33:37 EDT 2025
Thu Oct 16 14:42:52 EDT 2025
Mon Jul 21 06:06:02 EDT 2025
Wed Oct 01 04:15:38 EDT 2025
Thu Apr 24 23:11:50 EDT 2025
Sat Sep 06 07:27:22 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Deep learning
Distant supervision
Natural language processing
Post-translational modifications
Protein-protein interaction
BioBERT
Language English
License 2021. The Author(s).
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c641t-7f499a56ad90ccd03a10eb210bf4728c0719b6ccfff4d1fc3d3d6e20a34499ea3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink https://proxy.k.utb.cz/login?url=https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-021-04504-x
PMID 34983371
PQID 2620938921
PQPubID 44065
PageCount 23
ParticipantIDs doaj_primary_oai_doaj_org_article_38e7c141a7094506ad1a87c2f9ad4512
unpaywall_primary_10_1186_s12859_021_04504_x
pubmedcentral_primary_oai_pubmedcentral_nih_gov_8729035
proquest_miscellaneous_2616963049
proquest_journals_2620938921
gale_infotracmisc_A693692648
gale_infotracacademiconefile_A693692648
gale_incontextgauss_ISR_A693692648
pubmed_primary_34983371
crossref_citationtrail_10_1186_s12859_021_04504_x
crossref_primary_10_1186_s12859_021_04504_x
springer_journals_10_1186_s12859_021_04504_x
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-01-04
PublicationDateYYYYMMDD 2022-01-04
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-01-04
  day: 04
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle BMC bioinformatics
PublicationTitleAbbrev BMC Bioinformatics
PublicationTitleAlternate BMC Bioinformatics
PublicationYear 2022
Publisher BioMed Central
BioMed Central Ltd
Springer Nature B.V
BMC
Publisher_xml – name: BioMed Central
– name: BioMed Central Ltd
– name: Springer Nature B.V
– name: BMC
References 4504_CR30
H Zhang (4504_CR26) 2019; 7
TU Consortium (4504_CR5) 2018; 47
R Bunescu (4504_CR11) 2005; 33
4504_CR14
R Raisamo (4504_CR37) 2019; 131
4504_CR13
H Huang (4504_CR20) 2017; 46
4504_CR35
4504_CR32
J Futoma (4504_CR24) 2020; 2
4504_CR33
S Orchard (4504_CR3) 2013; 42
D Szklarczyk (4504_CR18) 2018; 47
S Orchard (4504_CR27) 2012; 9
S Orchard (4504_CR1) 2013; 42
S Orchard (4504_CR4) 2015
E Hüllermeier (4504_CR34) 2021; 110
GR Brown (4504_CR29) 2014; 43
S Pyysalo (4504_CR12) 2007; 8
M Torii (4504_CR17) 2015; 12
4504_CR9
4504_CR28
4504_CR25
N Srivastava (4504_CR31) 2014; 15
L Mottin (4504_CR36) 2017
4504_CR23
4504_CR22
J Lee (4504_CR10) 2019; 36
4504_CR6
4504_CR7
4504_CR8
Q Chen (4504_CR19) 2018
A Franceschini (4504_CR15) 2012; 41
CO Tudor (4504_CR16) 2015
PV Hornbeck (4504_CR21) 2014; 43
GR Mishra (4504_CR2) 2006; 34
References_xml – year: 2015
  ident: 4504_CR4
  publication-title: Database
  doi: 10.1093/database/bav009
– volume: 8
  start-page: 50
  issue: 1
  year: 2007
  ident: 4504_CR12
  publication-title: BMC Bioinf
  doi: 10.1186/1471-2105-8-50
– ident: 4504_CR6
– volume: 36
  start-page: 1234
  issue: 4
  year: 2019
  ident: 4504_CR10
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz682
– ident: 4504_CR25
  doi: 10.18653/v1/W17-2323
– volume: 34
  start-page: 411
  year: 2006
  ident: 4504_CR2
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkj141
– volume: 43
  start-page: 512
  issue: D1
  year: 2014
  ident: 4504_CR21
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gku1267
– ident: 4504_CR35
– ident: 4504_CR33
  doi: 10.18653/v1/2020.blackboxnlp-1.21
– volume: 12
  start-page: 17
  issue: 1
  year: 2015
  ident: 4504_CR17
  publication-title: IEEE/ACM Trans Comput Biol Bioinf
  doi: 10.1109/TCBB.2014.2372765
– volume: 2
  start-page: 489
  issue: 9
  year: 2020
  ident: 4504_CR24
  publication-title: Lancet Digital Health
  doi: 10.1016/S2589-7500(20)30186-2
– volume: 7
  start-page: 89354
  year: 2019
  ident: 4504_CR26
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2927253
– volume: 43
  start-page: 36
  issue: D1
  year: 2014
  ident: 4504_CR29
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gku1055
– ident: 4504_CR30
  doi: 10.18653/v1/N19-1423
– ident: 4504_CR7
– ident: 4504_CR22
– volume: 15
  start-page: 1929
  issue: 56
  year: 2014
  ident: 4504_CR31
  publication-title: J Mach Learn Res
– volume: 41
  start-page: 808
  issue: D1
  year: 2012
  ident: 4504_CR15
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks1094
– year: 2015
  ident: 4504_CR16
  publication-title: Database
  doi: 10.1093/database/bav020
– year: 2017
  ident: 4504_CR36
  publication-title: Database
  doi: 10.1093/database/bax040
– ident: 4504_CR13
– volume: 46
  start-page: 542
  issue: D1
  year: 2017
  ident: 4504_CR20
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx1104
– volume: 33
  start-page: 139
  issue: 2
  year: 2005
  ident: 4504_CR11
  publication-title: Artif Intell Med
  doi: 10.1016/j.artmed.2004.07.016
– volume: 131
  start-page: 131
  year: 2019
  ident: 4504_CR37
  publication-title: Int J Human-Comput Stud.
  doi: 10.1016/j.ijhcs.2019.05.008
– ident: 4504_CR32
  doi: 10.1016/j.knosys.2018.11.020
– volume: 42
  start-page: 358
  issue: (D1)
  year: 2013
  ident: 4504_CR3
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkt1115
– ident: 4504_CR28
  doi: 10.1155/2015/918710
– volume: 47
  start-page: 607
  issue: D1
  year: 2018
  ident: 4504_CR18
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gky1131
– volume: 110
  start-page: 457
  issue: 3
  year: 2021
  ident: 4504_CR34
  publication-title: Mach Learn
  doi: 10.1007/s10994-021-05946-3
– year: 2018
  ident: 4504_CR19
  publication-title: Database
  doi: 10.1093/database/bay122
– ident: 4504_CR9
  doi: 10.3115/1690219.1690287
– volume: 9
  start-page: 345
  issue: 4
  year: 2012
  ident: 4504_CR27
  publication-title: Nature Methods
  doi: 10.1038/nmeth.1931
– ident: 4504_CR8
– ident: 4504_CR23
  doi: 10.18653/v1/2021.eacl-main.113
– ident: 4504_CR14
  doi: 10.18653/v1/W17-2304
– volume: 42
  start-page: 358
  issue: D1
  year: 2013
  ident: 4504_CR1
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkt1115
– volume: 47
  start-page: 506
  issue: D1
  year: 2018
  ident: 4504_CR5
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gky1049
SSID ssj0017805
Score 2.4519792
Snippet Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions...
Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are...
Motivation We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and...
Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions...
Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein...
SourceID doaj
unpaywall
pubmedcentral
proquest
gale
pubmed
crossref
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 4
SubjectTerms Algorithms
Analysis
Annotations
Automation
BioBERT
Bioinformatics
Biomedical and Life Sciences
Calibration
Computational Biology/Bioinformatics
Computational linguistics
Computer Appl. in Life Sciences
Data Mining
Datasets
Deep learning
Distant supervision
Enzymes
Humans
Language processing
Life Sciences
Machine learning
Methods
Microarrays
Natural language interfaces
Natural language processing
Noise
Phosphorylation
Post-translation
Post-translational modification
Post-translational modifications
Predictions
Protein interaction
Protein Processing, Post-Translational
Protein-protein interaction
Protein-protein interactions
Proteins
PubMed
Supervision
Test sets
Translation
Uniqueness
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELZQJQQcEG8CBRmExIFajWOv7Rxb1Kog4FBaqTfL8QMqLdlVkxXtkX_OTJINDUiFA6do43E2mZnMwxl_Q8grD15UJMS30zox6aNipUqaaQ--jMcy8W4vzMdP6uBYvj-ZnVxq9YU1YT08cM-4bWGi9lxypyERmeXKBe4MXCmVLshZ11-4yE25TqaG7weI1L_eImPUdsMRp41hOQKEMLlk5xM31KH1_2mTLzml3wsmx6-mt8iNVb10F9_dfH7JMe3fIbeHiJLu9E9yl1yL9T1yve8xeXGf_PiAtd6sAVlE2qEynNZsONLlomlZi-5qPiwK0m-LgNVD3U8Klvus3_lAccGWBow265Y2qyXaGFxpo64OFJLq1HcnpfA_mIFDIEvhHnb3Do8ekOP9vaO3B2zou8C8krxlOkEa5GbA6jL3PuTC8RwScJ5XSerCeIhKykp5n1KSgScvgggqFrkTEiZGJx6SjXpRx8eEVlHo4JPhDi6tY6hM4XLnimiM11WaZYSvxWD9AEqOvTHmtktOjLK96CyIznais-cZeTPOWfaQHFdS76J0R0qE0-5OgJLZQcns35QsIy9RNywCZtRYkfPFrZrGvvt8aHcUtkTEOsGMvB6I0gKF44YNDsAJxNiaUG5OKOGN9tPhtQrawaI0FhsHlBBdFjwjL8ZhnIlVcnVcrJCGKzCokPRl5FGvseNzC1kaITTM1hNdnjBmOlKffu3wxg0kYLkAWW2ttf7XbV3F-K3xzfgHOT35H3J6Sm4WuD0Fl8jkJtloz1bxGQSNbfW8sw8_AYCEaGs
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELdGJwQ8IL4GgYEMQuKBRYvjNE4eEFpRp4GgQmWT9mY5_hiTSlKaVGyP_Ofc5WsLSBVPVWu7iX3n-7DvfkfIaw1alDvEtxPC-ZG2sZ_GTvhCgy5jNnWszoX5MouPTqJPp-PTLTLrcmEwrLKTibWgNoXGM_J9BE5PQbuG7P3yp49Vo_B2tSuhodrSCuZdDTF2g2yHiIw1ItuT6ezrvL9XQAT_LnUmifdLhvhtPoYpgGkTRP7FQD3VKP7_yupryurvQMr-NvUOubXOl-ryl1osrimsw3vkbmtp0oOGNe6TLZs_IDeb2pOXD8nvzxgD7pdAI0trtIbz3G8_6bIoK79CNbZoDwvpj8JgVFH9lYJEXzUZERQPcqlBKzSvaLleouzBEziqckPB2XZN1VIKz0HPHAxcCu8wmc6PH5GTw-nxhyO_rcfg6zhilS8cuEdqHCuTBlqbgCsWgGPOgsxFIkw0WCtpFmvtnIsMc5obbmIbBopHMNAqvkNGeZHbJ4RmlgujXcIU_LWwJktCFSgV2iTRInNjj7CODFK3YOVYM2Mha6cliWVDOgmkkzXp5IVH3vZjlg1Ux8beE6Ru3xNhtusfitWZbHet5IkVmkVMCfCCxwHMnKkE2NilykRgKnnkFfKGRCCNHCN1ztS6LOXHb3N5EGOpRIwf9MibtpMrkDiqTXyAlUDsrUHP3UFP2Ol62NyxoGwlTSmv9oVHXvbNOBKj53JbrLEPi0HQgjPokccNx_bz5lGacC5gtBjw8mBhhi35-fcahzwBxyzgQKu9juuvXmvTwu_1O-M_6PR086SfkdshJqTgoVi0S0bVam2fg5lYZS_avf8HtcBnIg
  priority: 102
  providerName: ProQuest
– databaseName: Springer Nature OA Free Journals
  dbid: C6C
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZQEQIOiDeBggxC4kAt4thrO8e2alUQcCit1Jvl-AGVluyqyQp65J8zk2RDA6iC02rX42zimczDnvmGkJcerKhIiG-ndWLSR8VKlTTTHmwZj2XiXS3Mh4_q4Fi-O5mdDDA5WAtz8fyeG_Wm4YiwxjCRAJyPXDLwF6-CkVLdwazaHU8MEJt_XRTz13kTw9Ph8_-phS-Yod9TJMdz0pvk-qpeuvNvbj6_YIr2b5Nbgw9Jt3um3yFXYn2XXOu7Sp7fIz_eY3Y3a2D1I-1wGE5rNnzS5aJpWYsGaj5sA9Kvi4D5Qt1XCrr6rK91oLhFSwP6l3VLm9UStQrurVFXBwphdOr7kVL4H4y5wXWlcA87e4dH98nx_t7R7gEbOi0wryRvmU4Q-LiZcqHMvQ-5cDyHkJvnVZK6MB78kLJS3qeUZODJiyCCikXuhISJ0YkHZKNe1PERoVUUOvhkuINL6xgqU7jcuSIa43WVZhnhazZYP8CQYzeMue3CEaNszzoLrLMd6-z3jLwe5yx7EI5LqXeQuyMlAmh3P4Bc2eF9tMJE7bnkTkN8O8vhybkzIKCpdEGCfGXkBcqGRYiMGnNwPrtV09i3nw7ttsImiJgZmJFXA1FaIHPcUNIAK4GoWhPKzQklvMN-OrwWQTvokMZiq4AS_MmCZ-T5OIwzMS-ujosV0nAFKhTCvIw87CV2fG4hSyOEhtl6IsuThZmO1KdfOoRxAyFXLoBXW2up_3Vbly381vhm_AOfHv_f1Z-QGwWWnuD2l9wkG-3ZKj4Fh7CtnnWa4CcaeVn6
  priority: 102
  providerName: Springer Nature
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwELemTQh44PsjMJBBSDywdHGdxMljhzYNBBMqqzSeLMexR0WXVE0iGG_859zli2agCSSeqtbnNj77vty73xHyQoMV5Rbx7YSwrq9N6MahFa7QYMuYiS2ra2HeH4WHM__tSXCyQaZdLUxyppN53oKGIlDxaL0MfdFUOWAXBbPaXaa2Efoo3C0YIrG5mHAATornu-BXboUB-OebZGt29GHyqS4zEsyFGCfoqmf-OHFgoWog_9_V9Zq9uphL2f-hep1crbKlOv-qFos1m3VwkxTdaptUlS-jqkxG-vsFIMj_y45b5Ebr4tJJcyZvkw2T3SFXmqaX53fJj3eYfO4WcDgMrWEi5pnbvtJlXpRuifZz0d5S0rM8xXSm-i0FU7JqSjEo3iDTFN3frKRFtUSlh1d_VGUphSjfNu1SKfwOXgmAZ03hGfb2p8f3yOxg__j1ods2gnB16LPSFRbiMhWEKo09rVOPK-aZBDYysb4YRxrcpDgJtbbW-imzmqc8Dc3YU9yHiUbx-2QzyzPzkNDEcJFqGzEFXy1MmkRj5Sk1NlGkRWIDh7Bu86VuUdKxWcdC1tFSFMqGtRJYK2vWym8OedXPWTYYIZdS7-GZ6ikR37v-IF-dylZdSB4ZoZnPlIDwO_Bg5UxFID82VqkPPppDnuOJlIjgkWGK0KmqikK--TiVkxB7NGLiokNetkQ2x81RbcUFcAJBvwaU2wNKUDF6ONwdfNmquEJiJ4MY3N0xc8izfhhnYtpeZvIKaVgIGh6iUIc8aOSkXzf344hzAbPFQIIGjBmOZPPPNQB6BBGhx2GvdjpZ-_VYlzF-p5fHv9inR_9G_phcG2NlDN7O-dtks1xV5gn4q2XytFVAPwF4JpK6
  priority: 102
  providerName: Unpaywall
Title Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
URI https://link.springer.com/article/10.1186/s12859-021-04504-x
https://www.ncbi.nlm.nih.gov/pubmed/34983371
https://www.proquest.com/docview/2620938921
https://www.proquest.com/docview/2616963049
https://pubmed.ncbi.nlm.nih.gov/PMC8729035
https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-021-04504-x
https://doaj.org/article/38e7c141a7094506ad1a87c2f9ad4512
UnpaywallVersion publishedVersion
Volume 23
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVADU
  databaseName: BioMedCentral
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RBZ
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.biomedcentral.com/search/
  providerName: BioMedCentral
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: KQ8
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: KQ8
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate - TFS
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: ABDBF
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: ADMLS
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: DIK
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: GX1
  dateStart: 0
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M~E
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RPM
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: 7X7
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: BENPR
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: 8FG
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/technologycollection1
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 20250131
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M48
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
– providerCode: PRVAVX
  databaseName: Springer Nature HAS Fully OA
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: AAJSJ
  dateStart: 20001201
  isFulltext: true
  titleUrlDefault: https://www.springernature.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: Springer Nature OA Free Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: C6C
  dateStart: 20000112
  isFulltext: true
  titleUrlDefault: http://www.springeropen.com/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1db9Mw0NqHEPCA-KYwKoMQPDBD3KRx8oBQW7WMilVTt0rdk-U69phU0q5JRfvIP-cuSbMFpmkvjRqfk9h3vg_7Pgh5p0GKuhbz2wlhmaeNz0LfCiY0yDJuQsuzWJjDgX8w8vrj5niLbNxtiwlMrjXtsJ7UaDH9tLpYf4UF_yVb8IH_OeGYhY2hswEoKI7HVu_nFwwLS-EBbFFlY5vsgvAKsbrDoXd50IAp_bMAJMEZWD_NTVzNtY-tyK4sxf__jPyKJPvXy7I8ar1P7i7juVr_VtPpFWnWe0geFGoobeV084hsmfgxuZMXplw_IX9-oIM4SwCBhmapHM5jVlzpfJakLEUZNy12EumvWYQuR9lfCux-kYdLUNzlpRGqqHFKk-UcGRNuz1EVRxQscZuXNKXwHjTbQful8A3t7vDkKRn1uiedA1YUa2Da93jKhAXbSTV9FYWO1pHjKu6A1c6difVEI9CgyoQTX2trrRdxq93IjXzTcJTrQUej3GdkJ57F5gWhE-OKSNuAK3i0MNEkaChHqYYJAi0mtlkjfIMGqYtM5lhQYyoziybwZY46CaiTGerkqkY-ln3meR6PG6HbiN0SEnNwZzdmizNZLGnpBkZo7nElwERuOjByrgKgcRuqyAM9qkbeIm1IzLIRoxvPmVomifx-PJQtH-soonNhjXwogOwMkaOKqAiYCUzMVYHcq0ACG9DV5g0Jys0qklhtIASVtMFr5E3ZjD3RtS42syXCcB-4MFiKNfI8p9hy3K4XBq4roLeo0HJlYqot8fnPLEl5AFab4wKu9jdUf_lZN038frkyboGnl7cY1Styr4EhK7ht5u2RnXSxNK9BkUwndbItxgJ-g963OtlttfrHfbi2u4OjIdzt-J16tkVTz1gGtIwGR63Tv0NOd-Y
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR1Nb9Mw1BpDaHBAfBMYYBCIwxYtjtM4OSC0waaWdTuMTerNOI49JpWkNK22HvlD_Ebey9cWkCouO0WNn9PY7_l95X0Q8laDFOUW69sJYd1Am9CNQytcoUGWMRNbVubCHByG_ZPgy6g3WiG_m1wYDKtseGLJqNNco498CwunxyBdffZx8tPFrlH4dbVpoVGRxb5ZnIPJVnwYfAb8vvP9vd3jT3237irg6jBgM1dYUPJVL1Rp7GmdelwxD8xL5iU2EH6kQebGSai1tTZImdU85WlofE_xACYaxeG5N8jNgAMvgfMjRq2Bx7A_QJOYE4VbBcPqcC4GQYDi5AXuRUf4lT0C_pUEV0Th32Ga7bfaO2Rtnk3U4lyNx1fE4d49crfWY-l2RXj3yYrJHpBbVWfLxUPya4gR5m4BFGBoWQviLHPrK53kxcydoZAc165I-iNPMWap_ElBXkyrfAuKbmKaoo6bzWgxnyBnQ_8eVVlKwZS3VU9UCv-Ddj-ozxTeYWf36PgRObkWvDwmq1memaeEJoaLVNuIKXi0MGkS-cpTyjdRpEView5hDRqkrkuhY0eOsSxNoiiUFeokoE6WqJMXDtlo50yqQiBLoXcQuy0kFvEub-TTU1nzBMkjIzQLmBJgY_c8WDlTERwSG6s0AEXMIW-QNiSW6cgwDuhUzYtCDr4eye0QGzFidKJD3tdANkfkqDqtAnYCK3t1INc7kMBHdHe4IUFZ87FCXp46h7xuh3EmxuZlJp8jDAuBjYOp6ZAnFcW26-ZBHHEuYLbo0HJnY7oj2dn3ssp5BGafxwFXmw3VX77Wso3fbE_Gf-Dp2fJFvyJr_eODoRwODvefk9s-pr6g-y1YJ6uz6dy8AIV0lrwsuQAl366b7fwBNCKemw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwELfQEF8PiG8KAwxC4oFFi2PXTh63smmDMaGxSXuzHH-MSSWpmlSwR_5z7pI0NIAmeKpan9PEd74P5-53hLy2YEV5QHw7pUIkrJdRJoOKlAVbxnwWWFML8_FQ7p2I96fj05Uq_ibbfflKsq1pQJSmot6cudBu8VRuVgxx1yJMLwCXJBYReJFXBVg37GEwkZP-PQIi9i9LZf46b2COGtT-P3XzinH6PXGyf3t6i9xYFDNz8c1MpysGavcOud15lnSrFYW75Iov7pFrba_Ji_vkxwHmfEcV8MTTBp3hvIi6Tzorqzqq0WxNu8NB-rV0mEXUfKWgwedtBQTFg1vq0OssalotZqhr8MSNmsJRWMXQdiml8D8YiYNDS-EetneOjh-Qk92d48le1PVfiKwUrI5UgHDIjKVxWWyti7lhMQTiLM6DUElqwTvJcmltCEE4Fix33EmfxIYLmOgNf0jWirLwjwnNPVfOhpQZuLTyLk8TExuT-DS1Kg_jEWFLNmjbgZNjj4ypboKUVOqWdRpYpxvW6e8j8rafM2uhOS6l3kbu9pQIq938UM7PdLdLNU-9skwwoyDqHcfw5MykILYhM06AazQir1A2NAJnFJiZc2YWVaX3Px_pLYmtETFfcETedEShROaYrtABVgKxtgaU6wNK2Nl2OLwUQd1plkpjA4EMvMyEjcjLfhhnYrZc4csF0jAJihWCvxF51Eps_9xcZCnnCmargSwPFmY4Upx_aXDHUwjEYg682lhK_a_bumzhN_qd8Q98evJ_V39Brn96t6sP9g8_PCU3E6xNwfMxsU7W6vnCPwOPsc6fN0rhJ7nEZTA
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwELemTQh44PsjMJBBSDywdHGdxMljhzYNBBMqqzSeLMexR0WXVE0iGG_859zli2agCSSeqtbnNj77vty73xHyQoMV5Rbx7YSwrq9N6MahFa7QYMuYiS2ra2HeH4WHM__tSXCyQaZdLUxyppN53oKGIlDxaL0MfdFUOWAXBbPaXaa2Efoo3C0YIrG5mHAATornu-BXboUB-OebZGt29GHyqS4zEsyFGCfoqmf-OHFgoWog_9_V9Zq9uphL2f-hep1crbKlOv-qFos1m3VwkxTdaptUlS-jqkxG-vsFIMj_y45b5Ebr4tJJcyZvkw2T3SFXmqaX53fJj3eYfO4WcDgMrWEi5pnbvtJlXpRuifZz0d5S0rM8xXSm-i0FU7JqSjEo3iDTFN3frKRFtUSlh1d_VGUphSjfNu1SKfwOXgmAZ03hGfb2p8f3yOxg__j1ods2gnB16LPSFRbiMhWEKo09rVOPK-aZBDYysb4YRxrcpDgJtbbW-imzmqc8Dc3YU9yHiUbx-2QzyzPzkNDEcJFqGzEFXy1MmkRj5Sk1NlGkRWIDh7Bu86VuUdKxWcdC1tFSFMqGtRJYK2vWym8OedXPWTYYIZdS7-GZ6ikR37v-IF-dylZdSB4ZoZnPlIDwO_Bg5UxFID82VqkPPppDnuOJlIjgkWGK0KmqikK--TiVkxB7NGLiokNetkQ2x81RbcUFcAJBvwaU2wNKUDF6ONwdfNmquEJiJ4MY3N0xc8izfhhnYtpeZvIKaVgIGh6iUIc8aOSkXzf344hzAbPFQIIGjBmOZPPPNQB6BBGhx2GvdjpZ-_VYlzF-p5fHv9inR_9G_phcG2NlDN7O-dtks1xV5gn4q2XytFVAPwF4JpK6
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-scale+protein-protein+post-translational+modification+extraction+with+distant+supervision+and+confidence+calibrated+BioBERT&rft.jtitle=BMC+bioinformatics&rft.au=Elangovan%2C+Aparna&rft.au=Li%2C+Yuan&rft.au=Pires%2C+Douglas+E+V&rft.au=Davis%2C+Melissa+J&rft.date=2022-01-04&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=23&rft.issue=1&rft.spage=4&rft_id=info:doi/10.1186%2Fs12859-021-04504-x&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon