COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to...
Saved in:
Published in | Protein science Vol. 32; no. 2; pp. e4541 - n/a |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.02.2023
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
ISSN | 0961-8368 1469-896X 1469-896X |
DOI | 10.1002/pro.4541 |
Cover
Abstract | The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis. |
---|---|
AbstractList | The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis. The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis. The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis. |
Author | Altman, Russ B. Derry, Alexander |
AuthorAffiliation | 2 Departments of Bioengineering, Genetics, and Medicine Stanford University Stanford California USA 1 Department of Biomedical Data Science Stanford University Stanford California USA |
AuthorAffiliation_xml | – name: 1 Department of Biomedical Data Science Stanford University Stanford California USA – name: 2 Departments of Bioengineering, Genetics, and Medicine Stanford University Stanford California USA |
Author_xml | – sequence: 1 givenname: Alexander orcidid: 0000-0003-2076-1184 surname: Derry fullname: Derry, Alexander organization: Stanford University – sequence: 2 givenname: Russ B. surname: Altman fullname: Altman, Russ B. email: russ.altman@stanford.edu organization: Stanford University |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36519247$$D View this record in MEDLINE/PubMed |
BookMark | eNp1kV9LHDEUxYNY6moLfgIJ9KUvs00myUzGh8Ky2D-wsNJW8C1kMjcanU3WZKZiP31jx9pW7FNI8rvnnsPZR7s-eEDokJI5JaR8t41hzgWnO2hGedUUsqnOd9GMNBUtJKvkHtpP6YoQwmnJXqI9VgnalLyeoZvlerVanH49OcYLHGEbIYEf9OCCxz3o6J2_wDbqDdyGeI1tiNh1mXDWmYnSvsPmUkdtBojux_QYLM6eBnAepyGOZhij7nFyA6RX6IXVfYLXD-cBOvtw8m35qVitP35eLlaF4UzSQoiu5W1nLQjSas1BWLCllRXUxHIrmG21Ma2pOt0Y2jBm6xyuI9JAy7lo2AF6P-lux3YDncmmswe1jW6j450K2ql_f7y7VBfhu2okr4kss8DbB4EYbkZIg9q4ZKDvtYcwJlXWgkshCSMZffMEvQpj9DlepmpS0VKwOlNHfzt6tPK7jT8bTQwpRbCPCCXqvuh8D-q-6IzOn6DGTbXlLK5_bqCYBm5dD3f_FVanX9a_-J8Vt7zH |
CitedBy_id | crossref_primary_10_3390_molecules30020214 crossref_primary_10_1016_j_jsb_2024_108118 crossref_primary_10_1021_acs_jcim_3c00722 crossref_primary_10_3390_molecules28135169 |
Cites_doi | 10.1101/2022.06.02.494367 10.1101/2022.04.10.487779 10.1101/2022.02.07.479398 10.1021/acs.jcim.9b00628 10.1038/s41592-022-01490-7 10.1093/nar/gkab1061 10.1101/2022.01.04.474934 10.1093/bioinformatics/btg224 10.48550/arXiv.2106.03843 10.1093/nar/gks1234 10.1038/s41467-021-23303-9 10.1038/nature14539 10.1093/nar/gkz991 10.1093/nar/gks1067 10.48550/arXiv.2007.06252 10.1371/journal.pcbi.1002326 10.1093/nar/gkx1012 10.1109/CVPR.2014.222 10.1016/j.jmb.2003.08.057 10.1016/j.csbj.2020.02.008 10.1093/nar/gki078 10.48550/arXiv.2011.05126 10.48550/arXiv.2011.10566 10.1093/nar/gky1100 10.1093/nar/gkx337 10.1093/protein/12.2.85 10.1016/S0969-2126(97)00260-8 10.1093/bioinformatics/bty813 10.1007/s00775-014-1128-3 10.1093/nar/gkt1243 10.1038/nbt.3988 10.1093/nar/gkab354 10.1093/bib/3.3.252 10.1101/2021.09.26.461876 10.1107/S0907444902003451 10.1093/bioinformatics/btz595 10.1038/s41592-019-0598-1 10.1073/pnas.2016239118 10.1093/nar/gkg087 10.48550/arXiv.2009.01411 10.1101/2021.09.20.461077 10.1038/s41586-021-03819-2 10.1371/journal.pcbi.1003589 10.1093/nar/gkt1130 10.1371/journal.pone.0091240 10.1038/s41467-022-29443-w 10.1101/2022.04.18.488641 10.1038/s41592-019-0666-6 10.1002/pro.5560040404 10.1093/nar/gky995 10.1109/TBDATA.2019.2921572 10.1038/75556 10.48550/arXiv.2203.06125 10.1371/journal.pcbi.1000605 10.1093/nar/gkq366 10.1006/jmbi.1998.1993 |
ContentType | Journal Article |
Copyright | 2022 The Authors. published by Wiley Periodicals LLC on behalf of The Protein Society. 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. 2022. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2022 The Authors. published by Wiley Periodicals LLC on behalf of The Protein Society. – notice: 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. – notice: 2022. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 24P AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QO 7T5 7TM 7U9 8FD FR3 H94 K9. P64 RC3 7X8 5PM |
DOI | 10.1002/pro.4541 |
DatabaseName | Wiley Online Library Open Access CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Biotechnology Research Abstracts Immunology Abstracts Nucleic Acids Abstracts Virology and AIDS Abstracts Technology Research Database Engineering Research Database AIDS and Cancer Research Abstracts ProQuest Health & Medical Complete (Alumni) Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Genetics Abstracts Virology and AIDS Abstracts Biotechnology Research Abstracts Technology Research Database Nucleic Acids Abstracts AIDS and Cancer Research Abstracts ProQuest Health & Medical Complete (Alumni) Immunology Abstracts Engineering Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic CrossRef MEDLINE Genetics Abstracts |
Database_xml | – sequence: 1 dbid: 24P name: Wiley Online Library Open Access url: https://authorservices.wiley.com/open-science/open-access/browse-journals.html sourceTypes: Publisher – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Anatomy & Physiology Chemistry |
DocumentTitleAlternate | Derry and Altman |
EISSN | 1469-896X |
EndPage | n/a |
ExternalDocumentID | PMC9847082 36519247 10_1002_pro_4541 PRO4541 |
Genre | article Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
GrantInformation_xml | – fundername: U.S. National Library of Medicine funderid: LM012409 – fundername: Chan Zuckerberg Initiative – fundername: National Institutes of Health funderid: GM102365 – fundername: NLM NIH HHS grantid: T32 LM012409 – fundername: NIGMS NIH HHS grantid: R01 GM102365 – fundername: ; – fundername: ; grantid: LM012409 – fundername: ; grantid: GM102365 |
GroupedDBID | --- .GJ 05W 0R~ 123 1L6 1OC 24P 29P 2WC 31~ 33P 3SF 3WU 4.4 52U 53G 5RE 6TJ 8-0 8-1 8UM A00 A8Z AAESR AAEVG AAHHS AAHQN AAIHA AAMNL AANLZ AAONW AASGY AAXRX AAYCA AAZKR ABCUV ABGDZ ABLJU ACAHQ ACCFJ ACCZN ACFBH ACGFO ACGFS ACIWK ACPOU ACPRK ACQPF ACXBN ACXQS ADBBV ADEOM ADIZJ ADKYN ADMGS ADOZA ADXAS ADZMN AEEZP AEIGN AEIMD AENEX AEQDE AEUQT AEUYR AFBPY AFFNX AFFPM AFGKR AFPWT AFRAH AFWVQ AFZJQ AHBTC AHMBA AIAGR AITYG AIURR AIWBW AJBDE AJXKR ALMA_UNASSIGNED_HOLDINGS ALUQN ALVPJ AMBMR AMYDB AOIJS ATUGU AUFTA AZVAB BFHJK BHBCM BMNLL BMXJE BNHUX BOGZA BRXPI C1A C45 CAG COF CS3 DCZOG DIK DRFUL DRSTM DU5 E3Z EBD EBS EJD EMOBN F5P G-S GODZA GX1 HGLYW HH5 HYE HZ~ IH2 LATKE LEEKS LITHE LOXES LUTES LYRES MEWTI MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM MY~ NNB O66 O9- OIG OK1 OVD P2P P2W P4E PQQKQ QRW RCA RIG ROL RPM RWI SJN SUPJJ SV3 TEORI TR2 WBKPD WIH WIK WIN WNSPC WOHZO WOQ WXSBR WYISQ WYJ XV2 Y6R YKV ZGI ZXP ZZTAW ~02 ~S- AAYXX AEYWJ AGHNM AGYGG CITATION AAMMB AEFGJ AGXDD AIDQK AIDYY CGR CUY CVF ECM EIF NPM 7QO 7T5 7TM 7U9 8FD FR3 H94 K9. P64 RC3 7X8 ESTFP LH4 5PM |
ID | FETCH-LOGICAL-c4381-55db4bdffe50baa4e5fef2f86e70f4f53fbaccbc6da9c1933f7412d08ceb44593 |
IEDL.DBID | 24P |
ISSN | 0961-8368 1469-896X |
IngestDate | Thu Aug 21 18:38:15 EDT 2025 Mon Sep 08 03:39:14 EDT 2025 Sun Jul 13 04:42:20 EDT 2025 Mon Jul 21 05:37:55 EDT 2025 Tue Jul 01 00:33:42 EDT 2025 Thu Apr 24 23:06:18 EDT 2025 Wed Jan 22 16:17:48 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Keywords | functional site annotation deep learning protein structure analysis structural informatics representation learning |
Language | English |
License | Attribution-NonCommercial-NoDerivs 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c4381-55db4bdffe50baa4e5fef2f86e70f4f53fbaccbc6da9c1933f7412d08ceb44593 |
Notes | Funding information Review Editor Nir Ben‐Tal Chan Zuckerberg Initiative; National Institutes of Health, Grant/Award Number: GM102365; U.S. National Library of Medicine, Grant/Award Number: LM012409 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Review Editor: Nir Ben‐Tal Funding information Chan Zuckerberg Initiative; National Institutes of Health, Grant/Award Number: GM102365; U.S. National Library of Medicine, Grant/Award Number: LM012409 |
ORCID | 0000-0003-2076-1184 |
OpenAccessLink | https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fpro.4541 |
PMID | 36519247 |
PQID | 2770612537 |
PQPubID | 1016442 |
PageCount | 15 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9847082 proquest_miscellaneous_2754858030 proquest_journals_2770612537 pubmed_primary_36519247 crossref_primary_10_1002_pro_4541 crossref_citationtrail_10_1002_pro_4541 wiley_primary_10_1002_pro_4541_PRO4541 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | February 2023 2023-02-00 20230201 |
PublicationDateYYYYMMDD | 2023-02-01 |
PublicationDate_xml | – month: 02 year: 2023 text: February 2023 |
PublicationDecade | 2020 |
PublicationPlace | Hoboken, USA |
PublicationPlace_xml | – name: Hoboken, USA – name: United States – name: Bethesda |
PublicationTitle | Protein science |
PublicationTitleAlternate | Protein Sci |
PublicationYear | 2023 |
Publisher | John Wiley & Sons, Inc Wiley Subscription Services, Inc |
Publisher_xml | – name: John Wiley & Sons, Inc – name: Wiley Subscription Services, Inc |
References | 2002; 58 1998; 281 2019; 473074 2017; 45 2020; 17 2019; 16 2011; 12 2003; 19 2008; 4 1997; 5 2022; 27 2018; 46 2020; 18 2017; 70 2021; 596 2017; 35 2021; 118 1999; 12 2020; 48 2014; 19 2014; 9 2005; 33 2014; 10 2021; 49 2021; 7 2010; 38 2000; 25 2015; 521 2022; 50 2019; 35 2013; 41 2002; 3 2020; 2009 2020; 36 2020; 33 2003; 333 2003; 31 2011; 7 2014; 42 2021; 12 2022 2021 2020 2019; 47 2022; 13 2015 2014 2009; 5 2022; 19 e_1_2_8_28_1 e_1_2_8_24_1 e_1_2_8_47_1 Xin F, Radivojac P (e_1_2_8_63_1) 2011; 12 e_1_2_8_26_1 e_1_2_8_49_1 Duvenaud D, Maclaurin D, Aguilera‐Iparraguirre J (e_1_2_8_16_1) 2015 Townshend RJL, Vögele M, Suriana PA (e_1_2_8_58_1) 2021 e_1_2_8_3_1 e_1_2_8_5_1 e_1_2_8_7_1 e_1_2_8_9_1 e_1_2_8_20_1 e_1_2_8_43_1 e_1_2_8_66_1 e_1_2_8_45_1 e_1_2_8_64_1 e_1_2_8_62_1 e_1_2_8_41_1 e_1_2_8_60_1 e_1_2_8_17_1 e_1_2_8_19_1 e_1_2_8_13_1 e_1_2_8_36_1 e_1_2_8_59_1 e_1_2_8_15_1 e_1_2_8_38_1 e_1_2_8_57_1 Hu W, Liu B, Gomes J, et al (e_1_2_8_29_1) 2020 e_1_2_8_32_1 e_1_2_8_55_1 e_1_2_8_11_1 e_1_2_8_34_1 e_1_2_8_53_1 e_1_2_8_51_1 e_1_2_8_30_1 Grill J‐B, Strub F, Altché F (e_1_2_8_21_1) 2020; 33 e_1_2_8_25_1 e_1_2_8_46_1 e_1_2_8_27_1 e_1_2_8_48_1 e_1_2_8_2_1 e_1_2_8_4_1 e_1_2_8_6_1 e_1_2_8_8_1 e_1_2_8_42_1 e_1_2_8_23_1 e_1_2_8_44_1 e_1_2_8_65_1 Derry A (e_1_2_8_14_1) 2022; 27 e_1_2_8_40_1 e_1_2_8_61_1 e_1_2_8_18_1 e_1_2_8_39_1 e_1_2_8_35_1 e_1_2_8_37_1 Gilmer J (e_1_2_8_22_1) 2017; 70 e_1_2_8_10_1 e_1_2_8_31_1 e_1_2_8_56_1 e_1_2_8_12_1 e_1_2_8_33_1 e_1_2_8_54_1 e_1_2_8_52_1 e_1_2_8_50_1 |
References_xml | – volume: 36 start-page: 422 year: 2020 end-page: 9 article-title: DeepGOPlus: improved protein function prediction from sequence publication-title: Bioinformatics – volume: 46 start-page: D618 year: 2018 end-page: 23 article-title: Mechanism and catalytic site atlas (M‐CSA): a database of enzyme reaction mechanisms and active sites publication-title: Nucleic Acids Res – year: 2022 article-title: Learning inverse folding from millions of predicted structures publication-title: bioRxiv – volume: 49 start-page: W535 year: 2021 end-page: 40 article-title: PredictProtein ‐ predicting protein structure and function for 29 years publication-title: Nucleic Acids Res – volume: 47 start-page: D427 year: 2019 end-page: 32 article-title: The Pfam protein families database in 2019 publication-title: Nucleic Acids Res – volume: 17 start-page: 184 year: 2020 end-page: 92 article-title: Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning publication-title: Nat Methods – volume: 19 start-page: 730 year: 2022 end-page: 9 article-title: ScanNet: an interpretable geometric deep learning model for structure‐based protein binding site prediction publication-title: Nat Methods – year: 2021 – volume: 7 year: 2011 article-title: Using multiple microenvironments to find similar ligand‐binding sites: application to kinase inhibitor binding publication-title: PLoS Comput Biol – volume: 12 start-page: 85 year: 1999 end-page: 94 article-title: Twilight zone of protein sequence alignments publication-title: Protein Eng – volume: 13 start-page: 1914 year: 2022 article-title: Learning meaningful representations of protein sequences publication-title: Nat Commun – volume: 42 start-page: D485 year: 2014 end-page: 9 article-title: The catalytic site atlas 2.0: cataloging catalytic sites and residues identified in enzymes publication-title: Nucleic Acids Res – volume: 47 start-page: D351 year: 2019 end-page: 60 article-title: InterPro in 2019: improving coverage, classification and access to protein sequence annotations publication-title: Nucleic Acids Res – volume: 27 start-page: 10 year: 2022 end-page: 21 article-title: Training data composition affects performance of protein structure analysis algorithms publication-title: Pac Symp Biocomput – volume: 50 start-page: D439 year: 2022 end-page: 44 article-title: AlphaFold protein structure database: massively expanding the structural coverage of protein‐sequence space with high‐accuracy models publication-title: Nucleic Acids Res – year: 2022 article-title: AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms publication-title: bioRxiv – volume: 41 start-page: D344 year: 2013 end-page: 7 article-title: New and continuing developments at PROSITE publication-title: Nucleic Acids Res – year: 2020 article-title: Intrinsic‐extrinsic convolution and pooling for learning on 3D protein structures publication-title: arXiv – year: 2014 – volume: 31 start-page: 383 year: 2003 end-page: 7 article-title: CDD: a curated Entrez database of conserved domain alignments publication-title: Nucleic Acids Res – year: 2021 article-title: ProteInfer: deep networks for protein functional inference publication-title: bioRxiv – volume: 45 start-page: W315 year: 2017 end-page: 9 article-title: GASS‐WEB: a web server for identifying enzyme active sites based on genetic algorithms publication-title: Nucleic Acids Res – year: 2022 article-title: PDBspheres ‐ a method for finding 3D similarities in local regions in proteins publication-title: bioRxiv – volume: 3 start-page: 252 year: 2002 end-page: 63 article-title: The PRINTS database: a resource for identification of protein families publication-title: Brief Bioinform – volume: 7 start-page: 535 year: 2021 end-page: 47 article-title: Billion‐scale similarity search with GPUs publication-title: IEEE Trans Big Data – year: 2022 article-title: Foldseek: fast and accurate protein structure search publication-title: bioRxiv – volume: 10 year: 2014 article-title: Knowledge‐based fragment binding prediction publication-title: PLoS Comput Biol – volume: 35 start-page: 1503 year: 2019 end-page: 12 article-title: High precision protein functional site detection using 3D convolutional neural networks publication-title: Bioinformatics – volume: 58 start-page: 899 year: 2002 end-page: 907 article-title: The Protein Data Bank publication-title: Acta Crystallogr D Biol Crystallogr – volume: 473074 start-page: 4131 year: 2019 end-page: 49 article-title: Graph convolutional neural networks for predicting drug‐target interactions publication-title: J Chem Inf Model – year: 2022 article-title: The field of protein function prediction as viewed by different domain scientists publication-title: bioRxiv – volume: 333 start-page: 863 year: 2003 end-page: 82 article-title: How well is enzyme function conserved as a function of pairwise sequence identity? publication-title: J Mol Biol – volume: 25 start-page: 25 year: 2000 end-page: 9 article-title: Gene ontology: tool for the unification of biology publication-title: Nat Genet – volume: 33 start-page: D284 year: 2005 end-page: 8 article-title: The PANTHER database of protein families, subfamilies, functions and pathways publication-title: Nucleic Acids Res – year: 2020 article-title: Strategies for pre‐training graph neural networks publication-title: International conference on learning representations – volume: 9 year: 2014 article-title: High precision prediction of functional sites in protein structures publication-title: PLoS One – volume: 38 start-page: W545 year: 2010 end-page: 9 article-title: Dali server: conservation mapping in 3D publication-title: Nucleic Acids Res – volume: 281 start-page: 949 year: 1998 end-page: 68 article-title: Method for prediction of protein function from sequence using the sequence‐to‐structure‐to‐function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases publication-title: J Mol Biol – year: 2015 – volume: 16 start-page: 1315 year: 2019 end-page: 22 article-title: Unified rational protein engineering with sequence‐based deep representation learning publication-title: Nat Methods – volume: 19 start-page: 1589 year: 2003 end-page: 91 article-title: PISCES: a protein sequence culling server publication-title: Bioinformatics – volume: 42 start-page: D521 year: 2014 end-page: 30 article-title: The structure‐function linkage database publication-title: Nucleic Acids Res – volume: 596 start-page: 583 year: 2021 end-page: 9 article-title: Highly accurate protein structure prediction with AlphaFold publication-title: Nature – volume: 5 start-page: 1093 year: 1997 end-page: 108 article-title: CATH–a hierarchic classification of protein domain structures publication-title: Structure – year: 2020 article-title: Exploring simple Siamese representation learning publication-title: arXiv – year: 2022 article-title: Protein representation learning by geometric structure Pretraining publication-title: arXiva – volume: 48 start-page: D265 year: 2020 end-page: 8 article-title: CDD/SPARCLE: the conserved domain database in 2020 publication-title: Nucleic Acids Res – volume: 2009 start-page: 01411 year: 2020 article-title: Learning from protein structure with geometric vector Perceptrons publication-title: arXiv – year: 2015 article-title: Convolutional networks on graphs for learning molecular fingerprints publication-title: Advances in Neural Information Processing Systems – volume: 118 year: 2021 article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences publication-title: Proc Natl Acad Sci U S A – year: 2020 article-title: Self‐supervised graph representation learning via bootstrapping publication-title: arXiv – volume: 35 start-page: 1026 year: 2017 end-page: 8 article-title: MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets publication-title: Nat Biotechnol – volume: 12 year: 2011 article-title: Computational methods for identification of functional residues in protein structures publication-title: Curr Protein Pept Sci – volume: 33 start-page: 21271 year: 2020 end-page: 84 article-title: Bootstrap your own latent ‐ a new approach to self‐supervised learning publication-title: Adv Neural Inf Process Syst – volume: 18 start-page: 417 year: 2020 end-page: 26 article-title: Exploring the computational methods for protein‐ligand binding site prediction publication-title: Comput Struct Biotechnol J – volume: 19 start-page: 937 year: 2014 end-page: 45 article-title: MetalS3, a database‐mining tool for the identification of structurally similar metal sites publication-title: J Biol Inorg Chem – volume: 5 year: 2009 article-title: Annotation error in public databases: misannotation of molecular function in enzyme superfamilies publication-title: PLoS Comput Biol – volume: 41 start-page: D387 year: 2013 end-page: 95 article-title: TIGRFAMs and genome properties in 2013 publication-title: Nucleic Acids Res – year: 2021 article-title: A structural biology community assessment of AlphaFold 2 applications publication-title: bioRxiv – volume: 521 start-page: 436 year: 2015 end-page: 44 article-title: Deep learning publication-title: Nature – volume: 12 start-page: 1 year: 2021 end-page: 14 article-title: Structure‐based protein function prediction using graph convolutional networks publication-title: Nat Commun – volume: 70 year: 2017 article-title: Neural message passing for quantum chemistry publication-title: Proceedings of the 34th International Conference on Machine Learning – volume: 4 start-page: 622 year: 2008 end-page: 35 article-title: Characterizing the microenvironment surrounding protein sites publication-title: Protein Sci – year: 2021 article-title: Equivariant graph neural networks for 3D macromolecular structure publication-title: arXiv – ident: e_1_2_8_10_1 doi: 10.1101/2022.06.02.494367 – ident: e_1_2_8_28_1 doi: 10.1101/2022.04.10.487779 – ident: e_1_2_8_59_1 doi: 10.1101/2022.02.07.479398 – ident: e_1_2_8_56_1 doi: 10.1021/acs.jcim.9b00628 – year: 2020 ident: e_1_2_8_29_1 article-title: Strategies for pre‐training graph neural networks publication-title: International conference on learning representations – ident: e_1_2_8_55_1 doi: 10.1038/s41592-022-01490-7 – ident: e_1_2_8_61_1 doi: 10.1093/nar/gkab1061 – ident: e_1_2_8_64_1 doi: 10.1101/2022.01.04.474934 – ident: e_1_2_8_62_1 doi: 10.1093/bioinformatics/btg224 – ident: e_1_2_8_30_1 doi: 10.48550/arXiv.2106.03843 – ident: e_1_2_8_24_1 doi: 10.1093/nar/gks1234 – ident: e_1_2_8_23_1 doi: 10.1038/s41467-021-23303-9 – volume: 70 year: 2017 ident: e_1_2_8_22_1 article-title: Neural message passing for quantum chemistry publication-title: Proceedings of the 34th International Conference on Machine Learning – ident: e_1_2_8_36_1 doi: 10.1038/nature14539 – ident: e_1_2_8_38_1 doi: 10.1093/nar/gkz991 – ident: e_1_2_8_52_1 doi: 10.1093/nar/gks1067 – ident: e_1_2_8_25_1 doi: 10.48550/arXiv.2007.06252 – ident: e_1_2_8_37_1 doi: 10.1371/journal.pcbi.1002326 – ident: e_1_2_8_46_1 doi: 10.1093/nar/gkx1012 – ident: e_1_2_8_43_1 doi: 10.1109/CVPR.2014.222 – ident: e_1_2_8_53_1 doi: 10.1016/j.jmb.2003.08.057 – ident: e_1_2_8_66_1 doi: 10.1016/j.csbj.2020.02.008 – ident: e_1_2_8_41_1 doi: 10.1093/nar/gki078 – ident: e_1_2_8_12_1 doi: 10.48550/arXiv.2011.05126 – ident: e_1_2_8_13_1 doi: 10.48550/arXiv.2011.10566 – ident: e_1_2_8_42_1 doi: 10.1093/nar/gky1100 – ident: e_1_2_8_40_1 doi: 10.1093/nar/gkx337 – ident: e_1_2_8_48_1 doi: 10.1093/protein/12.2.85 – ident: e_1_2_8_26_1 – ident: e_1_2_8_34_1 – ident: e_1_2_8_44_1 doi: 10.1016/S0969-2126(97)00260-8 – ident: e_1_2_8_57_1 doi: 10.1093/bioinformatics/bty813 – ident: e_1_2_8_60_1 doi: 10.1007/s00775-014-1128-3 – ident: e_1_2_8_19_1 doi: 10.1093/nar/gkt1243 – ident: e_1_2_8_51_1 doi: 10.1038/nbt.3988 – ident: e_1_2_8_9_1 doi: 10.1093/nar/gkab354 – volume: 27 start-page: 10 year: 2022 ident: e_1_2_8_14_1 article-title: Training data composition affects performance of protein structure analysis algorithms publication-title: Pac Symp Biocomput – ident: e_1_2_8_6_1 doi: 10.1093/bib/3.3.252 – ident: e_1_2_8_2_1 doi: 10.1101/2021.09.26.461876 – ident: e_1_2_8_8_1 doi: 10.1107/S0907444902003451 – ident: e_1_2_8_35_1 doi: 10.1093/bioinformatics/btz595 – ident: e_1_2_8_4_1 doi: 10.1038/s41592-019-0598-1 – ident: e_1_2_8_47_1 doi: 10.1073/pnas.2016239118 – ident: e_1_2_8_39_1 doi: 10.1093/nar/gkg087 – ident: e_1_2_8_31_1 doi: 10.48550/arXiv.2009.01411 – ident: e_1_2_8_49_1 doi: 10.1101/2021.09.20.461077 – ident: e_1_2_8_33_1 doi: 10.1038/s41586-021-03819-2 – ident: e_1_2_8_54_1 doi: 10.1371/journal.pcbi.1003589 – ident: e_1_2_8_3_1 doi: 10.1093/nar/gkt1130 – ident: e_1_2_8_11_1 doi: 10.1371/journal.pone.0091240 – ident: e_1_2_8_15_1 doi: 10.1038/s41467-022-29443-w – volume: 12 year: 2011 ident: e_1_2_8_63_1 article-title: Computational methods for identification of functional residues in protein structures publication-title: Curr Protein Pept Sci – ident: e_1_2_8_45_1 doi: 10.1101/2022.04.18.488641 – year: 2015 ident: e_1_2_8_16_1 article-title: Convolutional networks on graphs for learning molecular fingerprints publication-title: Advances in Neural Information Processing Systems – volume-title: Neural Information Processing Systems Track on Datasets and Benchmarks year: 2021 ident: e_1_2_8_58_1 – ident: e_1_2_8_20_1 doi: 10.1038/s41592-019-0666-6 – ident: e_1_2_8_7_1 doi: 10.1002/pro.5560040404 – volume: 33 start-page: 21271 year: 2020 ident: e_1_2_8_21_1 article-title: Bootstrap your own latent ‐ a new approach to self‐supervised learning publication-title: Adv Neural Inf Process Syst – ident: e_1_2_8_17_1 doi: 10.1093/nar/gky995 – ident: e_1_2_8_32_1 doi: 10.1109/TBDATA.2019.2921572 – ident: e_1_2_8_5_1 doi: 10.1038/75556 – ident: e_1_2_8_65_1 doi: 10.48550/arXiv.2203.06125 – ident: e_1_2_8_50_1 doi: 10.1371/journal.pcbi.1000605 – ident: e_1_2_8_27_1 doi: 10.1093/nar/gkq366 – ident: e_1_2_8_18_1 doi: 10.1006/jmbi.1998.1993 |
SSID | ssj0004123 |
Score | 2.4487379 |
Snippet | The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms,... |
SourceID | pubmedcentral proquest pubmed crossref wiley |
SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
StartPage | e4541 |
SubjectTerms | Benchmarks Collapse Computer applications Datasets deep learning functional site annotation Health risks Learning Mutation Protein Conformation Protein interaction protein structure analysis Proteins Proteins - chemistry representation learning Representations Software Structural analysis structural informatics Structure-function relationships Tools for Protein Science Transfer learning |
Title | COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
URI | https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fpro.4541 https://www.ncbi.nlm.nih.gov/pubmed/36519247 https://www.proquest.com/docview/2770612537 https://www.proquest.com/docview/2754858030 https://pubmed.ncbi.nlm.nih.gov/PMC9847082 |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwELYqOLSXikcfCxS5UgWnlMSvZHtbrUCo4rFqQeIW2Y4NK4EXtHDg3zNjJ4EVVOopB48dy2OPP9sz3xDyQ5m8NEPBMuW9zIQyLtOu0Rk3ZSGsK4zSeA95fKIOz8XvC3nRelViLEzih-gv3HBlRHuNC1yb-d4zaSgYmJ9CYsz6MmB6jrObiclzTGTBUhp5VWQVV1VHPJuzva7m4lb0Cl--dpN8CV_j_nOwQj62wJGOkqZXyTsX1sj6KMCh-eaR7tDoyhnvyNfI-3GXxm2d3I1Pj45Gk7_7v-iIRgrLLtwo0DZlxCX1nYsWBQxLp03rQpSkdGio7XmdU9gmnXkaKR6mgSYKWqTvoPgUPf9Ezg_2z8aHWZtnIbNI8JVJ2RhhGu-dzI3WwknvPPOVcmXuhZfcG22tsarRQwuAj3uAIazJK-uMEHLIP5OlMAvuK6FgQCx3VjMrjZCFr5yEhmJG8zzXvhiQ3W7Ia9uSkGMujOs60SezGvpeo3IG5HsveZuIN96Q2eq0VrdLb16zsoywjZfQRF8Mg44vITq42QPKwEFNVmDgBuRLUnL_E64kHkqhdrmg_l4ACbkXS8L0KhJzD2GrB0g1IDtxovyz3_Xkzyl-N_5XcJN8wDT3yVt8iyyBXt03AEP3ZjvO-m3ckOQTB6UL0Q |
linkProvider | Wiley-Blackwell |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fTxQxEJ4QfMAXo4B6CloTA08r-6Pt7uHT5QI59ICLQsLbpu22cAn0MOiD_70z7e7iBU142odOu5vOdvpNO_MNwEep01IPeZ5I50TCpbaJso1KCl1m3NhMS0XnkMcncnLOv1yIixX43OXCRH6I_sCNVkaw17TA6UB67541FC3MJy4oaf0Jl-i5EK0zn90nRWZ5rCMvs6QqZNUxz6b5XtdzeS96ADAfxkn-jV_DBnT4HJ61yJGNoqpfwIr167Ax8ug13_xmOyzEcoZD8nVYG3d13Dbgx_h0Oh3Nvh_ssxELHJZdvpFnbc2IS-a6GC2GIJbNmzaGKEop3zDTEzvHvE22cCxwPMw9ixy0xN_B6C76bhPODw_OxpOkLbSQGGL4SoRoNNeNc1akWiluhbMud5W0Zeq4E4XTyhhtZKOGBhFf4RCH5E1aGas5F8PiJaz6hbevgaEFMYU1KjdCc5G5ygocKJQ0T1PlsgHsdlNem5aFnIphXNeRPzmv8dtrUs4APvSSt5F54x8yW53W6nbt3dV5WQbcVpQ4RN-Mk05XIcrbxS-SQU9NVGjhBvAqKrl_SSEFeaXYu1xSfy9AjNzLLX5-FZi5h7jXI6YawE74Uf773fXs2yk93zxW8D2sTc6Op_X06OTrW3hKNe9j6PgWrKKO7TYio5_6XVgBfwAL8A5R |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1RbxQhEJ6YNlFfjLZqT6tiYurT2l0W2D3fLmcvVc_2ojbp2wZY0EuUa9P2wX_vDOxuvVQTn_aBgSUMAx8w8w3AK2XyyowFz5T3MhPKuEy7VmelqQphXWGUpnvIT0fq8ER8OJWnnVclxcIkfojhwo0sI67XZOBnrd-_Jg3FBeaNkBSzvilw2tHs5mJxHRNZ8JRGXhVZXaq6J57N-X5fc30ruoEvb7pJ_glf4_4zuw_3OuDIJknTD-CWC1uwPQl4aP75i-2x6MoZ78i34M60T-O2DefT4_l8svhy8JZNWKSw7MONAutSRnxjvnfRYohh2bLtXIiSlA4tswOvcwrbZCvPIsXDMrBEQUv0HYyeoi8ewsns4Ov0MOvyLGSWCL4yKVsjTOu9k7nRWjjpnee-Vq7KvfCy9EZba6xq9dgi4Cs9whDe5rV1Rgg5Lh_BRlgFtwMMFxBbOqu5lUbIwtdOYkMxo3mea1-M4HU_5I3tSMgpF8aPJtEn8wb73pByRvBykDxLxBt_kdnttdZ0pnfR8KqKsK2ssImhGAedXkJ0cKsrksGDmqxxgRvB46Tk4SelknQoxdrVmvoHASLkXi8Jy--RmHuMWz1CqhHsxYnyz343i8_H9H3yv4Iv4Pbi3ayZvz_6-BTuUsb75Di-CxuoYvcMcdGleR4N4DeQdw2M |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=COLLAPSE%3A+A+representation+learning+framework+for+identification+and+characterization+of+protein+structural+sites&rft.jtitle=Protein+science&rft.au=Derry%2C+Alexander&rft.au=Altman%2C+Russ+B&rft.date=2023-02-01&rft.issn=1469-896X&rft.eissn=1469-896X&rft.volume=32&rft.issue=2&rft.spage=e4541&rft_id=info:doi/10.1002%2Fpro.4541&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0961-8368&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0961-8368&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0961-8368&client=summon |