COLLAPSE: A representation learning framework for identification and characterization of protein structural sites

The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to...

Full description

Saved in:
Bibliographic Details
Published inProtein science Vol. 32; no. 2; pp. e4541 - n/a
Main Authors Derry, Alexander, Altman, Russ B.
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.02.2023
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN0961-8368
1469-896X
1469-896X
DOI10.1002/pro.4541

Cover

Abstract The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis.
AbstractList The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.
The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis.
The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.
Author Altman, Russ B.
Derry, Alexander
AuthorAffiliation 2 Departments of Bioengineering, Genetics, and Medicine Stanford University Stanford California USA
1 Department of Biomedical Data Science Stanford University Stanford California USA
AuthorAffiliation_xml – name: 1 Department of Biomedical Data Science Stanford University Stanford California USA
– name: 2 Departments of Bioengineering, Genetics, and Medicine Stanford University Stanford California USA
Author_xml – sequence: 1
  givenname: Alexander
  orcidid: 0000-0003-2076-1184
  surname: Derry
  fullname: Derry, Alexander
  organization: Stanford University
– sequence: 2
  givenname: Russ B.
  surname: Altman
  fullname: Altman, Russ B.
  email: russ.altman@stanford.edu
  organization: Stanford University
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36519247$$D View this record in MEDLINE/PubMed
BookMark eNp1kV9LHDEUxYNY6moLfgIJ9KUvs00myUzGh8Ky2D-wsNJW8C1kMjcanU3WZKZiP31jx9pW7FNI8rvnnsPZR7s-eEDokJI5JaR8t41hzgWnO2hGedUUsqnOd9GMNBUtJKvkHtpP6YoQwmnJXqI9VgnalLyeoZvlerVanH49OcYLHGEbIYEf9OCCxz3o6J2_wDbqDdyGeI1tiNh1mXDWmYnSvsPmUkdtBojux_QYLM6eBnAepyGOZhij7nFyA6RX6IXVfYLXD-cBOvtw8m35qVitP35eLlaF4UzSQoiu5W1nLQjSas1BWLCllRXUxHIrmG21Ma2pOt0Y2jBm6xyuI9JAy7lo2AF6P-lux3YDncmmswe1jW6j450K2ql_f7y7VBfhu2okr4kss8DbB4EYbkZIg9q4ZKDvtYcwJlXWgkshCSMZffMEvQpj9DlepmpS0VKwOlNHfzt6tPK7jT8bTQwpRbCPCCXqvuh8D-q-6IzOn6DGTbXlLK5_bqCYBm5dD3f_FVanX9a_-J8Vt7zH
CitedBy_id crossref_primary_10_3390_molecules30020214
crossref_primary_10_1016_j_jsb_2024_108118
crossref_primary_10_1021_acs_jcim_3c00722
crossref_primary_10_3390_molecules28135169
Cites_doi 10.1101/2022.06.02.494367
10.1101/2022.04.10.487779
10.1101/2022.02.07.479398
10.1021/acs.jcim.9b00628
10.1038/s41592-022-01490-7
10.1093/nar/gkab1061
10.1101/2022.01.04.474934
10.1093/bioinformatics/btg224
10.48550/arXiv.2106.03843
10.1093/nar/gks1234
10.1038/s41467-021-23303-9
10.1038/nature14539
10.1093/nar/gkz991
10.1093/nar/gks1067
10.48550/arXiv.2007.06252
10.1371/journal.pcbi.1002326
10.1093/nar/gkx1012
10.1109/CVPR.2014.222
10.1016/j.jmb.2003.08.057
10.1016/j.csbj.2020.02.008
10.1093/nar/gki078
10.48550/arXiv.2011.05126
10.48550/arXiv.2011.10566
10.1093/nar/gky1100
10.1093/nar/gkx337
10.1093/protein/12.2.85
10.1016/S0969-2126(97)00260-8
10.1093/bioinformatics/bty813
10.1007/s00775-014-1128-3
10.1093/nar/gkt1243
10.1038/nbt.3988
10.1093/nar/gkab354
10.1093/bib/3.3.252
10.1101/2021.09.26.461876
10.1107/S0907444902003451
10.1093/bioinformatics/btz595
10.1038/s41592-019-0598-1
10.1073/pnas.2016239118
10.1093/nar/gkg087
10.48550/arXiv.2009.01411
10.1101/2021.09.20.461077
10.1038/s41586-021-03819-2
10.1371/journal.pcbi.1003589
10.1093/nar/gkt1130
10.1371/journal.pone.0091240
10.1038/s41467-022-29443-w
10.1101/2022.04.18.488641
10.1038/s41592-019-0666-6
10.1002/pro.5560040404
10.1093/nar/gky995
10.1109/TBDATA.2019.2921572
10.1038/75556
10.48550/arXiv.2203.06125
10.1371/journal.pcbi.1000605
10.1093/nar/gkq366
10.1006/jmbi.1998.1993
ContentType Journal Article
Copyright 2022 The Authors. published by Wiley Periodicals LLC on behalf of The Protein Society.
2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.
2022. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022 The Authors. published by Wiley Periodicals LLC on behalf of The Protein Society.
– notice: 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.
– notice: 2022. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 24P
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7T5
7TM
7U9
8FD
FR3
H94
K9.
P64
RC3
7X8
5PM
DOI 10.1002/pro.4541
DatabaseName Wiley Online Library Open Access
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Immunology Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Technology Research Database
Engineering Research Database
AIDS and Cancer Research Abstracts
ProQuest Health & Medical Complete (Alumni)
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Virology and AIDS Abstracts
Biotechnology Research Abstracts
Technology Research Database
Nucleic Acids Abstracts
AIDS and Cancer Research Abstracts
ProQuest Health & Medical Complete (Alumni)
Immunology Abstracts
Engineering Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
CrossRef
MEDLINE


Genetics Abstracts
Database_xml – sequence: 1
  dbid: 24P
  name: Wiley Online Library Open Access
  url: https://authorservices.wiley.com/open-science/open-access/browse-journals.html
  sourceTypes: Publisher
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
DocumentTitleAlternate Derry and Altman
EISSN 1469-896X
EndPage n/a
ExternalDocumentID PMC9847082
36519247
10_1002_pro_4541
PRO4541
Genre article
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: U.S. National Library of Medicine
  funderid: LM012409
– fundername: Chan Zuckerberg Initiative
– fundername: National Institutes of Health
  funderid: GM102365
– fundername: NLM NIH HHS
  grantid: T32 LM012409
– fundername: NIGMS NIH HHS
  grantid: R01 GM102365
– fundername: ;
– fundername: ;
  grantid: LM012409
– fundername: ;
  grantid: GM102365
GroupedDBID ---
.GJ
05W
0R~
123
1L6
1OC
24P
29P
2WC
31~
33P
3SF
3WU
4.4
52U
53G
5RE
6TJ
8-0
8-1
8UM
A00
A8Z
AAESR
AAEVG
AAHHS
AAHQN
AAIHA
AAMNL
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAZKR
ABCUV
ABGDZ
ABLJU
ACAHQ
ACCFJ
ACCZN
ACFBH
ACGFO
ACGFS
ACIWK
ACPOU
ACPRK
ACQPF
ACXBN
ACXQS
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADOZA
ADXAS
ADZMN
AEEZP
AEIGN
AEIMD
AENEX
AEQDE
AEUQT
AEUYR
AFBPY
AFFNX
AFFPM
AFGKR
AFPWT
AFRAH
AFWVQ
AFZJQ
AHBTC
AHMBA
AIAGR
AITYG
AIURR
AIWBW
AJBDE
AJXKR
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMYDB
AOIJS
ATUGU
AUFTA
AZVAB
BFHJK
BHBCM
BMNLL
BMXJE
BNHUX
BOGZA
BRXPI
C1A
C45
CAG
COF
CS3
DCZOG
DIK
DRFUL
DRSTM
DU5
E3Z
EBD
EBS
EJD
EMOBN
F5P
G-S
GODZA
GX1
HGLYW
HH5
HYE
HZ~
IH2
LATKE
LEEKS
LITHE
LOXES
LUTES
LYRES
MEWTI
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
MY~
NNB
O66
O9-
OIG
OK1
OVD
P2P
P2W
P4E
PQQKQ
QRW
RCA
RIG
ROL
RPM
RWI
SJN
SUPJJ
SV3
TEORI
TR2
WBKPD
WIH
WIK
WIN
WNSPC
WOHZO
WOQ
WXSBR
WYISQ
WYJ
XV2
Y6R
YKV
ZGI
ZXP
ZZTAW
~02
~S-
AAYXX
AEYWJ
AGHNM
AGYGG
CITATION
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7T5
7TM
7U9
8FD
FR3
H94
K9.
P64
RC3
7X8
ESTFP
LH4
5PM
ID FETCH-LOGICAL-c4381-55db4bdffe50baa4e5fef2f86e70f4f53fbaccbc6da9c1933f7412d08ceb44593
IEDL.DBID 24P
ISSN 0961-8368
1469-896X
IngestDate Thu Aug 21 18:38:15 EDT 2025
Mon Sep 08 03:39:14 EDT 2025
Sun Jul 13 04:42:20 EDT 2025
Mon Jul 21 05:37:55 EDT 2025
Tue Jul 01 00:33:42 EDT 2025
Thu Apr 24 23:06:18 EDT 2025
Wed Jan 22 16:17:48 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords functional site annotation
deep learning
protein structure analysis
structural informatics
representation learning
Language English
License Attribution-NonCommercial-NoDerivs
2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.
This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4381-55db4bdffe50baa4e5fef2f86e70f4f53fbaccbc6da9c1933f7412d08ceb44593
Notes Funding information
Review Editor
Nir Ben‐Tal
Chan Zuckerberg Initiative; National Institutes of Health, Grant/Award Number: GM102365; U.S. National Library of Medicine, Grant/Award Number: LM012409
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Review Editor: Nir Ben‐Tal
Funding information Chan Zuckerberg Initiative; National Institutes of Health, Grant/Award Number: GM102365; U.S. National Library of Medicine, Grant/Award Number: LM012409
ORCID 0000-0003-2076-1184
OpenAccessLink https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fpro.4541
PMID 36519247
PQID 2770612537
PQPubID 1016442
PageCount 15
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_9847082
proquest_miscellaneous_2754858030
proquest_journals_2770612537
pubmed_primary_36519247
crossref_primary_10_1002_pro_4541
crossref_citationtrail_10_1002_pro_4541
wiley_primary_10_1002_pro_4541_PRO4541
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate February 2023
2023-02-00
20230201
PublicationDateYYYYMMDD 2023-02-01
PublicationDate_xml – month: 02
  year: 2023
  text: February 2023
PublicationDecade 2020
PublicationPlace Hoboken, USA
PublicationPlace_xml – name: Hoboken, USA
– name: United States
– name: Bethesda
PublicationTitle Protein science
PublicationTitleAlternate Protein Sci
PublicationYear 2023
Publisher John Wiley & Sons, Inc
Wiley Subscription Services, Inc
Publisher_xml – name: John Wiley & Sons, Inc
– name: Wiley Subscription Services, Inc
References 2002; 58
1998; 281
2019; 473074
2017; 45
2020; 17
2019; 16
2011; 12
2003; 19
2008; 4
1997; 5
2022; 27
2018; 46
2020; 18
2017; 70
2021; 596
2017; 35
2021; 118
1999; 12
2020; 48
2014; 19
2014; 9
2005; 33
2014; 10
2021; 49
2021; 7
2010; 38
2000; 25
2015; 521
2022; 50
2019; 35
2013; 41
2002; 3
2020; 2009
2020; 36
2020; 33
2003; 333
2003; 31
2011; 7
2014; 42
2021; 12
2022
2021
2020
2019; 47
2022; 13
2015
2014
2009; 5
2022; 19
e_1_2_8_28_1
e_1_2_8_24_1
e_1_2_8_47_1
Xin F, Radivojac P (e_1_2_8_63_1) 2011; 12
e_1_2_8_26_1
e_1_2_8_49_1
Duvenaud D, Maclaurin D, Aguilera‐Iparraguirre J (e_1_2_8_16_1) 2015
Townshend RJL, Vögele M, Suriana PA (e_1_2_8_58_1) 2021
e_1_2_8_3_1
e_1_2_8_5_1
e_1_2_8_7_1
e_1_2_8_9_1
e_1_2_8_20_1
e_1_2_8_43_1
e_1_2_8_66_1
e_1_2_8_45_1
e_1_2_8_64_1
e_1_2_8_62_1
e_1_2_8_41_1
e_1_2_8_60_1
e_1_2_8_17_1
e_1_2_8_19_1
e_1_2_8_13_1
e_1_2_8_36_1
e_1_2_8_59_1
e_1_2_8_15_1
e_1_2_8_38_1
e_1_2_8_57_1
Hu W, Liu B, Gomes J, et al (e_1_2_8_29_1) 2020
e_1_2_8_32_1
e_1_2_8_55_1
e_1_2_8_11_1
e_1_2_8_34_1
e_1_2_8_53_1
e_1_2_8_51_1
e_1_2_8_30_1
Grill J‐B, Strub F, Altché F (e_1_2_8_21_1) 2020; 33
e_1_2_8_25_1
e_1_2_8_46_1
e_1_2_8_27_1
e_1_2_8_48_1
e_1_2_8_2_1
e_1_2_8_4_1
e_1_2_8_6_1
e_1_2_8_8_1
e_1_2_8_42_1
e_1_2_8_23_1
e_1_2_8_44_1
e_1_2_8_65_1
Derry A (e_1_2_8_14_1) 2022; 27
e_1_2_8_40_1
e_1_2_8_61_1
e_1_2_8_18_1
e_1_2_8_39_1
e_1_2_8_35_1
e_1_2_8_37_1
Gilmer J (e_1_2_8_22_1) 2017; 70
e_1_2_8_10_1
e_1_2_8_31_1
e_1_2_8_56_1
e_1_2_8_12_1
e_1_2_8_33_1
e_1_2_8_54_1
e_1_2_8_52_1
e_1_2_8_50_1
References_xml – volume: 36
  start-page: 422
  year: 2020
  end-page: 9
  article-title: DeepGOPlus: improved protein function prediction from sequence
  publication-title: Bioinformatics
– volume: 46
  start-page: D618
  year: 2018
  end-page: 23
  article-title: Mechanism and catalytic site atlas (M‐CSA): a database of enzyme reaction mechanisms and active sites
  publication-title: Nucleic Acids Res
– year: 2022
  article-title: Learning inverse folding from millions of predicted structures
  publication-title: bioRxiv
– volume: 49
  start-page: W535
  year: 2021
  end-page: 40
  article-title: PredictProtein ‐ predicting protein structure and function for 29 years
  publication-title: Nucleic Acids Res
– volume: 47
  start-page: D427
  year: 2019
  end-page: 32
  article-title: The Pfam protein families database in 2019
  publication-title: Nucleic Acids Res
– volume: 17
  start-page: 184
  year: 2020
  end-page: 92
  article-title: Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
  publication-title: Nat Methods
– volume: 19
  start-page: 730
  year: 2022
  end-page: 9
  article-title: ScanNet: an interpretable geometric deep learning model for structure‐based protein binding site prediction
  publication-title: Nat Methods
– year: 2021
– volume: 7
  year: 2011
  article-title: Using multiple microenvironments to find similar ligand‐binding sites: application to kinase inhibitor binding
  publication-title: PLoS Comput Biol
– volume: 12
  start-page: 85
  year: 1999
  end-page: 94
  article-title: Twilight zone of protein sequence alignments
  publication-title: Protein Eng
– volume: 13
  start-page: 1914
  year: 2022
  article-title: Learning meaningful representations of protein sequences
  publication-title: Nat Commun
– volume: 42
  start-page: D485
  year: 2014
  end-page: 9
  article-title: The catalytic site atlas 2.0: cataloging catalytic sites and residues identified in enzymes
  publication-title: Nucleic Acids Res
– volume: 47
  start-page: D351
  year: 2019
  end-page: 60
  article-title: InterPro in 2019: improving coverage, classification and access to protein sequence annotations
  publication-title: Nucleic Acids Res
– volume: 27
  start-page: 10
  year: 2022
  end-page: 21
  article-title: Training data composition affects performance of protein structure analysis algorithms
  publication-title: Pac Symp Biocomput
– volume: 50
  start-page: D439
  year: 2022
  end-page: 44
  article-title: AlphaFold protein structure database: massively expanding the structural coverage of protein‐sequence space with high‐accuracy models
  publication-title: Nucleic Acids Res
– year: 2022
  article-title: AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms
  publication-title: bioRxiv
– volume: 41
  start-page: D344
  year: 2013
  end-page: 7
  article-title: New and continuing developments at PROSITE
  publication-title: Nucleic Acids Res
– year: 2020
  article-title: Intrinsic‐extrinsic convolution and pooling for learning on 3D protein structures
  publication-title: arXiv
– year: 2014
– volume: 31
  start-page: 383
  year: 2003
  end-page: 7
  article-title: CDD: a curated Entrez database of conserved domain alignments
  publication-title: Nucleic Acids Res
– year: 2021
  article-title: ProteInfer: deep networks for protein functional inference
  publication-title: bioRxiv
– volume: 45
  start-page: W315
  year: 2017
  end-page: 9
  article-title: GASS‐WEB: a web server for identifying enzyme active sites based on genetic algorithms
  publication-title: Nucleic Acids Res
– year: 2022
  article-title: PDBspheres ‐ a method for finding 3D similarities in local regions in proteins
  publication-title: bioRxiv
– volume: 3
  start-page: 252
  year: 2002
  end-page: 63
  article-title: The PRINTS database: a resource for identification of protein families
  publication-title: Brief Bioinform
– volume: 7
  start-page: 535
  year: 2021
  end-page: 47
  article-title: Billion‐scale similarity search with GPUs
  publication-title: IEEE Trans Big Data
– year: 2022
  article-title: Foldseek: fast and accurate protein structure search
  publication-title: bioRxiv
– volume: 10
  year: 2014
  article-title: Knowledge‐based fragment binding prediction
  publication-title: PLoS Comput Biol
– volume: 35
  start-page: 1503
  year: 2019
  end-page: 12
  article-title: High precision protein functional site detection using 3D convolutional neural networks
  publication-title: Bioinformatics
– volume: 58
  start-page: 899
  year: 2002
  end-page: 907
  article-title: The Protein Data Bank
  publication-title: Acta Crystallogr D Biol Crystallogr
– volume: 473074
  start-page: 4131
  year: 2019
  end-page: 49
  article-title: Graph convolutional neural networks for predicting drug‐target interactions
  publication-title: J Chem Inf Model
– year: 2022
  article-title: The field of protein function prediction as viewed by different domain scientists
  publication-title: bioRxiv
– volume: 333
  start-page: 863
  year: 2003
  end-page: 82
  article-title: How well is enzyme function conserved as a function of pairwise sequence identity?
  publication-title: J Mol Biol
– volume: 25
  start-page: 25
  year: 2000
  end-page: 9
  article-title: Gene ontology: tool for the unification of biology
  publication-title: Nat Genet
– volume: 33
  start-page: D284
  year: 2005
  end-page: 8
  article-title: The PANTHER database of protein families, subfamilies, functions and pathways
  publication-title: Nucleic Acids Res
– year: 2020
  article-title: Strategies for pre‐training graph neural networks
  publication-title: International conference on learning representations
– volume: 9
  year: 2014
  article-title: High precision prediction of functional sites in protein structures
  publication-title: PLoS One
– volume: 38
  start-page: W545
  year: 2010
  end-page: 9
  article-title: Dali server: conservation mapping in 3D
  publication-title: Nucleic Acids Res
– volume: 281
  start-page: 949
  year: 1998
  end-page: 68
  article-title: Method for prediction of protein function from sequence using the sequence‐to‐structure‐to‐function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases
  publication-title: J Mol Biol
– year: 2015
– volume: 16
  start-page: 1315
  year: 2019
  end-page: 22
  article-title: Unified rational protein engineering with sequence‐based deep representation learning
  publication-title: Nat Methods
– volume: 19
  start-page: 1589
  year: 2003
  end-page: 91
  article-title: PISCES: a protein sequence culling server
  publication-title: Bioinformatics
– volume: 42
  start-page: D521
  year: 2014
  end-page: 30
  article-title: The structure‐function linkage database
  publication-title: Nucleic Acids Res
– volume: 596
  start-page: 583
  year: 2021
  end-page: 9
  article-title: Highly accurate protein structure prediction with AlphaFold
  publication-title: Nature
– volume: 5
  start-page: 1093
  year: 1997
  end-page: 108
  article-title: CATH–a hierarchic classification of protein domain structures
  publication-title: Structure
– year: 2020
  article-title: Exploring simple Siamese representation learning
  publication-title: arXiv
– year: 2022
  article-title: Protein representation learning by geometric structure Pretraining
  publication-title: arXiva
– volume: 48
  start-page: D265
  year: 2020
  end-page: 8
  article-title: CDD/SPARCLE: the conserved domain database in 2020
  publication-title: Nucleic Acids Res
– volume: 2009
  start-page: 01411
  year: 2020
  article-title: Learning from protein structure with geometric vector Perceptrons
  publication-title: arXiv
– year: 2015
  article-title: Convolutional networks on graphs for learning molecular fingerprints
  publication-title: Advances in Neural Information Processing Systems
– volume: 118
  year: 2021
  article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
  publication-title: Proc Natl Acad Sci U S A
– year: 2020
  article-title: Self‐supervised graph representation learning via bootstrapping
  publication-title: arXiv
– volume: 35
  start-page: 1026
  year: 2017
  end-page: 8
  article-title: MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
  publication-title: Nat Biotechnol
– volume: 12
  year: 2011
  article-title: Computational methods for identification of functional residues in protein structures
  publication-title: Curr Protein Pept Sci
– volume: 33
  start-page: 21271
  year: 2020
  end-page: 84
  article-title: Bootstrap your own latent ‐ a new approach to self‐supervised learning
  publication-title: Adv Neural Inf Process Syst
– volume: 18
  start-page: 417
  year: 2020
  end-page: 26
  article-title: Exploring the computational methods for protein‐ligand binding site prediction
  publication-title: Comput Struct Biotechnol J
– volume: 19
  start-page: 937
  year: 2014
  end-page: 45
  article-title: MetalS3, a database‐mining tool for the identification of structurally similar metal sites
  publication-title: J Biol Inorg Chem
– volume: 5
  year: 2009
  article-title: Annotation error in public databases: misannotation of molecular function in enzyme superfamilies
  publication-title: PLoS Comput Biol
– volume: 41
  start-page: D387
  year: 2013
  end-page: 95
  article-title: TIGRFAMs and genome properties in 2013
  publication-title: Nucleic Acids Res
– year: 2021
  article-title: A structural biology community assessment of AlphaFold 2 applications
  publication-title: bioRxiv
– volume: 521
  start-page: 436
  year: 2015
  end-page: 44
  article-title: Deep learning
  publication-title: Nature
– volume: 12
  start-page: 1
  year: 2021
  end-page: 14
  article-title: Structure‐based protein function prediction using graph convolutional networks
  publication-title: Nat Commun
– volume: 70
  year: 2017
  article-title: Neural message passing for quantum chemistry
  publication-title: Proceedings of the 34th International Conference on Machine Learning
– volume: 4
  start-page: 622
  year: 2008
  end-page: 35
  article-title: Characterizing the microenvironment surrounding protein sites
  publication-title: Protein Sci
– year: 2021
  article-title: Equivariant graph neural networks for 3D macromolecular structure
  publication-title: arXiv
– ident: e_1_2_8_10_1
  doi: 10.1101/2022.06.02.494367
– ident: e_1_2_8_28_1
  doi: 10.1101/2022.04.10.487779
– ident: e_1_2_8_59_1
  doi: 10.1101/2022.02.07.479398
– ident: e_1_2_8_56_1
  doi: 10.1021/acs.jcim.9b00628
– year: 2020
  ident: e_1_2_8_29_1
  article-title: Strategies for pre‐training graph neural networks
  publication-title: International conference on learning representations
– ident: e_1_2_8_55_1
  doi: 10.1038/s41592-022-01490-7
– ident: e_1_2_8_61_1
  doi: 10.1093/nar/gkab1061
– ident: e_1_2_8_64_1
  doi: 10.1101/2022.01.04.474934
– ident: e_1_2_8_62_1
  doi: 10.1093/bioinformatics/btg224
– ident: e_1_2_8_30_1
  doi: 10.48550/arXiv.2106.03843
– ident: e_1_2_8_24_1
  doi: 10.1093/nar/gks1234
– ident: e_1_2_8_23_1
  doi: 10.1038/s41467-021-23303-9
– volume: 70
  year: 2017
  ident: e_1_2_8_22_1
  article-title: Neural message passing for quantum chemistry
  publication-title: Proceedings of the 34th International Conference on Machine Learning
– ident: e_1_2_8_36_1
  doi: 10.1038/nature14539
– ident: e_1_2_8_38_1
  doi: 10.1093/nar/gkz991
– ident: e_1_2_8_52_1
  doi: 10.1093/nar/gks1067
– ident: e_1_2_8_25_1
  doi: 10.48550/arXiv.2007.06252
– ident: e_1_2_8_37_1
  doi: 10.1371/journal.pcbi.1002326
– ident: e_1_2_8_46_1
  doi: 10.1093/nar/gkx1012
– ident: e_1_2_8_43_1
  doi: 10.1109/CVPR.2014.222
– ident: e_1_2_8_53_1
  doi: 10.1016/j.jmb.2003.08.057
– ident: e_1_2_8_66_1
  doi: 10.1016/j.csbj.2020.02.008
– ident: e_1_2_8_41_1
  doi: 10.1093/nar/gki078
– ident: e_1_2_8_12_1
  doi: 10.48550/arXiv.2011.05126
– ident: e_1_2_8_13_1
  doi: 10.48550/arXiv.2011.10566
– ident: e_1_2_8_42_1
  doi: 10.1093/nar/gky1100
– ident: e_1_2_8_40_1
  doi: 10.1093/nar/gkx337
– ident: e_1_2_8_48_1
  doi: 10.1093/protein/12.2.85
– ident: e_1_2_8_26_1
– ident: e_1_2_8_34_1
– ident: e_1_2_8_44_1
  doi: 10.1016/S0969-2126(97)00260-8
– ident: e_1_2_8_57_1
  doi: 10.1093/bioinformatics/bty813
– ident: e_1_2_8_60_1
  doi: 10.1007/s00775-014-1128-3
– ident: e_1_2_8_19_1
  doi: 10.1093/nar/gkt1243
– ident: e_1_2_8_51_1
  doi: 10.1038/nbt.3988
– ident: e_1_2_8_9_1
  doi: 10.1093/nar/gkab354
– volume: 27
  start-page: 10
  year: 2022
  ident: e_1_2_8_14_1
  article-title: Training data composition affects performance of protein structure analysis algorithms
  publication-title: Pac Symp Biocomput
– ident: e_1_2_8_6_1
  doi: 10.1093/bib/3.3.252
– ident: e_1_2_8_2_1
  doi: 10.1101/2021.09.26.461876
– ident: e_1_2_8_8_1
  doi: 10.1107/S0907444902003451
– ident: e_1_2_8_35_1
  doi: 10.1093/bioinformatics/btz595
– ident: e_1_2_8_4_1
  doi: 10.1038/s41592-019-0598-1
– ident: e_1_2_8_47_1
  doi: 10.1073/pnas.2016239118
– ident: e_1_2_8_39_1
  doi: 10.1093/nar/gkg087
– ident: e_1_2_8_31_1
  doi: 10.48550/arXiv.2009.01411
– ident: e_1_2_8_49_1
  doi: 10.1101/2021.09.20.461077
– ident: e_1_2_8_33_1
  doi: 10.1038/s41586-021-03819-2
– ident: e_1_2_8_54_1
  doi: 10.1371/journal.pcbi.1003589
– ident: e_1_2_8_3_1
  doi: 10.1093/nar/gkt1130
– ident: e_1_2_8_11_1
  doi: 10.1371/journal.pone.0091240
– ident: e_1_2_8_15_1
  doi: 10.1038/s41467-022-29443-w
– volume: 12
  year: 2011
  ident: e_1_2_8_63_1
  article-title: Computational methods for identification of functional residues in protein structures
  publication-title: Curr Protein Pept Sci
– ident: e_1_2_8_45_1
  doi: 10.1101/2022.04.18.488641
– year: 2015
  ident: e_1_2_8_16_1
  article-title: Convolutional networks on graphs for learning molecular fingerprints
  publication-title: Advances in Neural Information Processing Systems
– volume-title: Neural Information Processing Systems Track on Datasets and Benchmarks
  year: 2021
  ident: e_1_2_8_58_1
– ident: e_1_2_8_20_1
  doi: 10.1038/s41592-019-0666-6
– ident: e_1_2_8_7_1
  doi: 10.1002/pro.5560040404
– volume: 33
  start-page: 21271
  year: 2020
  ident: e_1_2_8_21_1
  article-title: Bootstrap your own latent ‐ a new approach to self‐supervised learning
  publication-title: Adv Neural Inf Process Syst
– ident: e_1_2_8_17_1
  doi: 10.1093/nar/gky995
– ident: e_1_2_8_32_1
  doi: 10.1109/TBDATA.2019.2921572
– ident: e_1_2_8_5_1
  doi: 10.1038/75556
– ident: e_1_2_8_65_1
  doi: 10.48550/arXiv.2203.06125
– ident: e_1_2_8_50_1
  doi: 10.1371/journal.pcbi.1000605
– ident: e_1_2_8_27_1
  doi: 10.1093/nar/gkq366
– ident: e_1_2_8_18_1
  doi: 10.1006/jmbi.1998.1993
SSID ssj0004123
Score 2.4487379
Snippet The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms,...
SourceID pubmedcentral
proquest
pubmed
crossref
wiley
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage e4541
SubjectTerms Benchmarks
Collapse
Computer applications
Datasets
deep learning
functional site annotation
Health risks
Learning
Mutation
Protein Conformation
Protein interaction
protein structure analysis
Proteins
Proteins - chemistry
representation learning
Representations
Software
Structural analysis
structural informatics
Structure-function relationships
Tools for Protein Science
Transfer learning
Title COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
URI https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fpro.4541
https://www.ncbi.nlm.nih.gov/pubmed/36519247
https://www.proquest.com/docview/2770612537
https://www.proquest.com/docview/2754858030
https://pubmed.ncbi.nlm.nih.gov/PMC9847082
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwELYqOLSXikcfCxS5UgWnlMSvZHtbrUCo4rFqQeIW2Y4NK4EXtHDg3zNjJ4EVVOopB48dy2OPP9sz3xDyQ5m8NEPBMuW9zIQyLtOu0Rk3ZSGsK4zSeA95fKIOz8XvC3nRelViLEzih-gv3HBlRHuNC1yb-d4zaSgYmJ9CYsz6MmB6jrObiclzTGTBUhp5VWQVV1VHPJuzva7m4lb0Cl--dpN8CV_j_nOwQj62wJGOkqZXyTsX1sj6KMCh-eaR7tDoyhnvyNfI-3GXxm2d3I1Pj45Gk7_7v-iIRgrLLtwo0DZlxCX1nYsWBQxLp03rQpSkdGio7XmdU9gmnXkaKR6mgSYKWqTvoPgUPf9Ezg_2z8aHWZtnIbNI8JVJ2RhhGu-dzI3WwknvPPOVcmXuhZfcG22tsarRQwuAj3uAIazJK-uMEHLIP5OlMAvuK6FgQCx3VjMrjZCFr5yEhmJG8zzXvhiQ3W7Ia9uSkGMujOs60SezGvpeo3IG5HsveZuIN96Q2eq0VrdLb16zsoywjZfQRF8Mg44vITq42QPKwEFNVmDgBuRLUnL_E64kHkqhdrmg_l4ACbkXS8L0KhJzD2GrB0g1IDtxovyz3_Xkzyl-N_5XcJN8wDT3yVt8iyyBXt03AEP3ZjvO-m3ckOQTB6UL0Q
linkProvider Wiley-Blackwell
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fTxQxEJ4QfMAXo4B6CloTA08r-6Pt7uHT5QI59ICLQsLbpu22cAn0MOiD_70z7e7iBU142odOu5vOdvpNO_MNwEep01IPeZ5I50TCpbaJso1KCl1m3NhMS0XnkMcncnLOv1yIixX43OXCRH6I_sCNVkaw17TA6UB67541FC3MJy4oaf0Jl-i5EK0zn90nRWZ5rCMvs6QqZNUxz6b5XtdzeS96ADAfxkn-jV_DBnT4HJ61yJGNoqpfwIr167Ax8ug13_xmOyzEcoZD8nVYG3d13Dbgx_h0Oh3Nvh_ssxELHJZdvpFnbc2IS-a6GC2GIJbNmzaGKEop3zDTEzvHvE22cCxwPMw9ixy0xN_B6C76bhPODw_OxpOkLbSQGGL4SoRoNNeNc1akWiluhbMud5W0Zeq4E4XTyhhtZKOGBhFf4RCH5E1aGas5F8PiJaz6hbevgaEFMYU1KjdCc5G5ygocKJQ0T1PlsgHsdlNem5aFnIphXNeRPzmv8dtrUs4APvSSt5F54x8yW53W6nbt3dV5WQbcVpQ4RN-Mk05XIcrbxS-SQU9NVGjhBvAqKrl_SSEFeaXYu1xSfy9AjNzLLX5-FZi5h7jXI6YawE74Uf773fXs2yk93zxW8D2sTc6Op_X06OTrW3hKNe9j6PgWrKKO7TYio5_6XVgBfwAL8A5R
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1RbxQhEJ6YNlFfjLZqT6tiYurT2l0W2D3fLmcvVc_2ojbp2wZY0EuUa9P2wX_vDOxuvVQTn_aBgSUMAx8w8w3AK2XyyowFz5T3MhPKuEy7VmelqQphXWGUpnvIT0fq8ER8OJWnnVclxcIkfojhwo0sI67XZOBnrd-_Jg3FBeaNkBSzvilw2tHs5mJxHRNZ8JRGXhVZXaq6J57N-X5fc30ruoEvb7pJ_glf4_4zuw_3OuDIJknTD-CWC1uwPQl4aP75i-2x6MoZ78i34M60T-O2DefT4_l8svhy8JZNWKSw7MONAutSRnxjvnfRYohh2bLtXIiSlA4tswOvcwrbZCvPIsXDMrBEQUv0HYyeoi8ewsns4Ov0MOvyLGSWCL4yKVsjTOu9k7nRWjjpnee-Vq7KvfCy9EZba6xq9dgi4Cs9whDe5rV1Rgg5Lh_BRlgFtwMMFxBbOqu5lUbIwtdOYkMxo3mea1-M4HU_5I3tSMgpF8aPJtEn8wb73pByRvBykDxLxBt_kdnttdZ0pnfR8KqKsK2ssImhGAedXkJ0cKsrksGDmqxxgRvB46Tk4SelknQoxdrVmvoHASLkXi8Jy--RmHuMWz1CqhHsxYnyz343i8_H9H3yv4Iv4Pbi3ayZvz_6-BTuUsb75Di-CxuoYvcMcdGleR4N4DeQdw2M
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=COLLAPSE%3A+A+representation+learning+framework+for+identification+and+characterization+of+protein+structural+sites&rft.jtitle=Protein+science&rft.au=Derry%2C+Alexander&rft.au=Altman%2C+Russ+B&rft.date=2023-02-01&rft.issn=1469-896X&rft.eissn=1469-896X&rft.volume=32&rft.issue=2&rft.spage=e4541&rft_id=info:doi/10.1002%2Fpro.4541&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0961-8368&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0961-8368&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0961-8368&client=summon