Usage of a dataset of NMR resolved protein structures to test aggregation versus solubility prediction algorithms

There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggr...

Full description

Saved in:

Bibliographic Details
Published in	Protein science Vol. 26; no. 9; pp. 1864 - 1869
Main Authors	Roche, Daniel B., Villain, Etienne, Kajava, Andrey V.
Format	Journal Article
Language	English
Published	United States Wiley Subscription Services, Inc 01.09.2017 John Wiley and Sons Inc
Subjects	3D structure Agglomeration Aggregates aggregation Algorithms Amyloid Amyloid - analysis Amyloid - chemistry Amyloid - metabolism amyloid fibrils computational approaches Computer applications Cutting tools Databases, Protein Forming In vivo methods and tests Models, Statistical NMR Nuclear magnetic resonance Nuclear Magnetic Resonance, Biomolecular Predictions Proteins Sequences Solubility soluble NMR database computational approaches amyloid fibrils soluble 3D structure aggregation
Online Access	Get full text
ISSN	0961-8368 1469-896X 1469-896X
DOI	10.1002/pro.3225

Cover

Abstract	There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.
AbstractList	There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation-prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non-forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR-resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation-prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non-forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR-resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity. There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo , are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo . We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity. There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.
Author	Kajava, Andrey V. Villain, Etienne Roche, Daniel B.
AuthorAffiliation	3 University ITMO, 49 Kronverksky Pr, 197101, St. Petersburg, Russia 2 Institut de Biologie Computationnelle, Université de Montpellier Montpellier France 1 Centre de Recherche en Biologie cellulaire de Montpellier, CNRS‐UMR 5237 Montpellier France
AuthorAffiliation_xml	– name: 3 University ITMO, 49 Kronverksky Pr, 197101, St. Petersburg, Russia – name: 1 Centre de Recherche en Biologie cellulaire de Montpellier, CNRS‐UMR 5237 Montpellier France – name: 2 Institut de Biologie Computationnelle, Université de Montpellier Montpellier France
Author_xml	– sequence: 1 givenname: Daniel B. orcidid: 0000-0002-9204-1840 surname: Roche fullname: Roche, Daniel B. email: daniel.roche@crbm.cnrs.fr organization: Institut de Biologie Computationnelle, Université de Montpellier – sequence: 2 givenname: Etienne surname: Villain fullname: Villain, Etienne organization: Institut de Biologie Computationnelle, Université de Montpellier – sequence: 3 givenname: Andrey V. surname: Kajava fullname: Kajava, Andrey V. email: andrey.kajava@crbm.cnrs.fr organization: University ITMO, 49 Kronverksky Pr, 197101, St. Petersburg, Russia
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/28685932$$D View this record in MEDLINE/PubMed
BookMark	eNp9kdtq3DAQhkVJaTZpoU9QBL1pL7zRwZatm0IIPUGalNBA74RsjR0FrbWR5A379tXu5tCGtldiNN_8M_PPAdob_QgIvaZkTglhR8vg55yx6hma0VLIopHi5x6aESlo0XDR7KODGK8JISVl_AXaZ41oKsnZDN1cRj0A9j3W2OikI6RNcPbtAgeI3q3A4KyewI44pjB1acr_OHmcICashyHAoJP1I15BiFPEuWhqrbNpnQvB2G6b1G7wwaarRXyJnvfaRXh19x6iy08ff5x8KU7PP389OT4tupI3VdHLFmhbl4LRtqx7AZ0BLklDjBSmkX3Jq7o3QDihhmkJlYCyrSphup7WnGh-iN7vdKdxqde32jm1DHahw1pRoja25dirjW2Z_bBjl1O7ANPBmIJ-5L226s_MaK_U4FcqN-SU11ng3Z1A8DdTdkYtbOzAOT2Cn6KiMg8lGONNRt8-Qa_9FMZsRabyNpQKKTP15veJHka5v9xjxy74GAP0_9tu_gTtbNreLO9i3d8Kil3BrXWw_qew-n5xvuV_ARXiybA
CitedBy_id	crossref_primary_10_3390_ijms21145038 crossref_primary_10_1016_j_jsb_2017_09_006 crossref_primary_10_1371_journal_pone_0193726 crossref_primary_10_3390_ijms24108571 crossref_primary_10_1016_j_molliq_2020_113618 crossref_primary_10_1093_bioinformatics_btx629 crossref_primary_10_1093_femsre_fuy038 crossref_primary_10_3389_fnmol_2019_00274 crossref_primary_10_1093_nar_gkz758 crossref_primary_10_3390_biomedicines9101451 crossref_primary_10_3390_ijms23169102 crossref_primary_10_3390_ijms22095002
Cites_doi	10.1016/j.jmb.2012.01.006 10.1093/bioinformatics/btu167 10.1016/j.bbamem.2014.10.002 10.1038/nbt1012 10.1093/bioinformatics/btv027 10.1093/nar/gki524 10.1093/bioinformatics/btp691 10.1371/journal.pone.0152949 10.1098/rstb.2000.0758 10.1073/pnas.0402427101 10.1371/journal.pcbi.0020170 10.1093/protein/gzm042 10.1016/j.febslet.2012.12.006 10.1073/pnas.0511295103 10.1002/jps.22705 10.1073/pnas.0505905102 10.1186/1471-2105-8-345 10.1074/jbc.M306004200 10.1016/j.bbapap.2003.12.008 10.1007/978-1-61779-465-0_14 10.1096/fj.09-145979 10.1016/j.ceb.2017.02.006 10.1093/nar/gku399 10.1093/nar/gku982 10.1038/nmeth.1432 10.1016/j.tibtech.2006.02.007 10.1146/annurev.med.57.121304.131243 10.1093/nar/28.1.235 10.1016/j.jalz.2014.06.007 10.1093/bioinformatics/btv375 10.1007/s00018-007-7404-4 10.1110/ps.036368.108 10.1016/S0022-2836(05)80360-2 10.1016/S0161-5890(02)00102-5 10.1093/bioinformatics/btp629 10.1042/BST20120160 10.1186/1471-2105-8-65 10.1093/bioinformatics/btl158 10.1093/nar/gkl959
ContentType	Journal Article
Copyright	2017 The Protein Society 2017 The Protein Society.
Copyright_xml	– notice: 2017 The Protein Society – notice: 2017 The Protein Society.
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QO 7T5 7TM 7U9 8FD FR3 H94 K9. P64 RC3 7X8 5PM ADTOC UNPAY
DOI	10.1002/pro.3225
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Biotechnology Research Abstracts Immunology Abstracts Nucleic Acids Abstracts Virology and AIDS Abstracts Technology Research Database Engineering Research Database AIDS and Cancer Research Abstracts ProQuest Health & Medical Complete (Alumni) Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Genetics Abstracts Virology and AIDS Abstracts Biotechnology Research Abstracts Technology Research Database Nucleic Acids Abstracts AIDS and Cancer Research Abstracts ProQuest Health & Medical Complete (Alumni) Immunology Abstracts Engineering Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic CrossRef Genetics Abstracts MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Anatomy & Physiology Chemistry
DocumentTitleAlternate	Prediction of Aggregation Versus Solubility Propensity
EISSN	1469-896X
EndPage	1869
ExternalDocumentID	10.1002/pro.3225 PMC5563137 28685932 10_1002_pro_3225 PRO3225
Genre	article Journal Article
GrantInformation_xml	– fundername: Ministère de l'Éducation nationale, de l'Enseignement supérieur et de la Recherche (MEESR) – fundername: Institut de Biologie Computationnelle, Université de Montpellier (ANR Investissements D'Avenir Bio‐informatique: projet IBC) – fundername: COST Action BM1405 (Non‐Globular Protein Network)
GroupedDBID	--- .GJ 05W 0R~ 123 1L6 1OC 29P 2WC 31~ 33P 3SF 3WU 4.4 52U 53G 5RE 6TJ 8-0 8-1 8UM A8Z AAESR AAEVG AAHQN AAIHA AAMMB AAMNL AANLZ AAONW AASGY AAXRX AAYCA AAZKR ABCUV ABGDZ ABLJU ACAHQ ACCZN ACFBH ACGFO ACGFS ACIWK ACPOU ACPRK ACQPF ACXBN ACXQS ADBBV ADEOM ADIZJ ADKYN ADMGS ADOZA ADXAS ADZMN AEFGJ AEIGN AEIMD AENEX AEUYR AEYWJ AFBPY AFFNX AFFPM AFGKR AFRAH AFWVQ AFZJQ AGHNM AGXDD AGYGG AHBTC AHMBA AIAGR AIDQK AIDYY AITYG AIURR AJXKR ALMA_UNASSIGNED_HOLDINGS ALUQN ALVPJ AMBMR AMYDB AOIJS ATUGU AUFTA AZVAB BFHJK BHBCM BMNLL BMXJE BNHUX BOGZA BRXPI C1A C45 CAG COF CS3 DCZOG DIK DRFUL DRSTM DU5 E3Z EBD EBS EJD EMOBN ESTFP F5P G-S GODZA GX1 HGLYW HH5 HYE HZ~ IH2 LATKE LEEKS LH4 LITHE LOXES LUTES LYRES MEWTI MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM MY~ NNB O66 O9- OIG OK1 OVD P2P P2W PQQKQ QRW RCA ROL RPM SJN SUPJJ SV3 TEORI TR2 WBKPD WIH WIK WIN WNSPC WOHZO WOQ WXSBR WYISQ XV2 Y6R YKV ZGI ZXP ZZTAW ~02 ~S- AAYXX CITATION CGR CUY CVF ECM EIF NPM RIG 7QO 7T5 7TM 7U9 8FD FR3 H94 K9. P64 RC3 7X8 5PM ADTOC UNPAY
ID	FETCH-LOGICAL-c4385-f9be1b74621b47f6ecde39080d96d89f4357fde0301d2a9e56e4b556dcf1730a3
IEDL.DBID	UNPAY
ISSN	0961-8368 1469-896X
IngestDate	Sun Oct 26 04:11:27 EDT 2025 Tue Sep 30 15:47:56 EDT 2025 Mon Sep 08 10:32:14 EDT 2025 Tue Oct 07 06:35:51 EDT 2025 Mon Jul 21 05:51:35 EDT 2025 Wed Oct 01 00:44:03 EDT 2025 Thu Apr 24 22:57:20 EDT 2025 Sun Sep 21 06:23:49 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	9
Keywords	NMR database computational approaches amyloid fibrils soluble 3D structure aggregation
Language	English
License	http://onlinelibrary.wiley.com/termsAndConditions#vor 2017 The Protein Society.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c4385-f9be1b74621b47f6ecde39080d96d89f4357fde0301d2a9e56e4b556dcf1730a3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0002-9204-1840
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/pro.3225
PMID	28685932
PQID	1930111699
PQPubID	1016442
PageCount	6
ParticipantIDs	unpaywall_primary_10_1002_pro_3225 pubmedcentral_primary_oai_pubmedcentral_nih_gov_5563137 proquest_miscellaneous_1917362238 proquest_journals_1930111699 pubmed_primary_28685932 crossref_primary_10_1002_pro_3225 crossref_citationtrail_10_1002_pro_3225 wiley_primary_10_1002_pro_3225_PRO3225
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	September 2017
PublicationDateYYYYMMDD	2017-09-01
PublicationDate_xml	– month: 09 year: 2017 text: September 2017
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: Bethesda – name: Hoboken
PublicationTitle	Protein science
PublicationTitleAlternate	Protein Sci
PublicationYear	2017
Publisher	Wiley Subscription Services, Inc John Wiley and Sons Inc
Publisher_xml	– name: Wiley Subscription Services, Inc – name: John Wiley and Sons Inc
References	2004; 101 2004; 22 2002; 39 2012; 421 2000; 28 2017; 47 2006; 57 2015; 31 2015; 11 2008; 17 2013; 587 2002; DBiol Crystallogr58 2006; 2 2003; 278 2007; 35 2014; 42 2016; 11 1990; 215 2010; 26 2010; 24 2006; 24 2015; 1848 2005; 102 2006; 22 2015; 43 2007; 8 2008; 65 2012; 819 2014; 30 2007; 20 2004; 1698 2010; 7 2005; 33 2001; 356 2011; 100 2006; 103 2012; 40 e_1_2_3_2_1 e_1_2_3_6_1 e_1_2_3_16_1 e_1_2_3_39_1 e_1_2_3_5_1 e_1_2_3_17_1 e_1_2_3_38_1 e_1_2_3_4_1 e_1_2_3_18_1 e_1_2_3_3_1 e_1_2_3_19_1 e_1_2_3_12_1 e_1_2_3_35_1 e_1_2_3_9_1 e_1_2_3_13_1 e_1_2_3_34_1 e_1_2_3_8_1 e_1_2_3_14_1 e_1_2_3_37_1 e_1_2_3_7_1 e_1_2_3_15_1 e_1_2_3_36_1 e_1_2_3_31_1 e_1_2_3_10_1 e_1_2_3_33_1 e_1_2_3_11_1 e_1_2_3_32_1 e_1_2_3_40_1 Berman HM (e_1_2_3_30_1) 2002; 58 e_1_2_3_27_1 e_1_2_3_28_1 e_1_2_3_29_1 e_1_2_3_23_1 e_1_2_3_24_1 e_1_2_3_25_1 e_1_2_3_26_1 e_1_2_3_20_1 e_1_2_3_41_1 e_1_2_3_21_1 e_1_2_3_22_1 16537487 - Proc Natl Acad Sci U S A. 2006 Mar 14;103(11):4074-8 12037327 - Acta Crystallogr D Biol Crystallogr. 2002 Jun;58(Pt 6 No 1):899-907 12200051 - Mol Immunol. 2002 Oct;39(3-4):203-15 22248587 - J Mol Biol. 2012 Aug 24;421(4-5):427-40 22988860 - Biochem Soc Trans. 2012 Oct;40(5):1032-7 2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10 15143215 - Proc Natl Acad Sci U S A. 2004 May 25;101(21):7885-90 22183539 - Methods Mol Biol. 2012;819:199-220 15849316 - Nucleic Acids Res. 2005 Apr 22;33(7):2302-9 16731699 - Bioinformatics. 2006 Jul 1;22(13):1658-9 11260793 - Philos Trans R Soc Lond B Biol Sci. 2001 Feb 28;356(1406):133-45 25150734 - Alzheimers Dement. 2015 Jun;11(6):681-90 25306968 - Biochim Biophys Acta. 2015 Jan;1848(1 Pt A):1-7 17173479 - PLoS Comput Biol. 2006 Dec 15;2(12):e170 17877795 - BMC Bioinformatics. 2007 Sep 18;8:345 24681906 - Bioinformatics. 2014 Jul 15;30(14):1983-90 12917441 - J Biol Chem. 2003 Oct 31;278(44):43717-27 18034321 - Cell Mol Life Sci. 2008 Mar;65(6):910-27 16263932 - Proc Natl Acad Sci U S A. 2005 Nov 15;102(46):16672-7 20019059 - Bioinformatics. 2010 Feb 1;26(3):326-32 24848016 - Nucleic Acids Res. 2014 Jul;42(Web Server issue):W301-7 20032312 - FASEB J. 2010 May;24(5):1311-9 26088800 - Bioinformatics. 2015 Oct 15;31(20):3395-7 19897565 - Bioinformatics. 2010 Jan 15;26(2):182-8 16503059 - Trends Biotechnol. 2006 Apr;24(4):179-85 28342303 - Curr Opin Cell Biol. 2017 Aug;47:34-42 25361972 - Nucleic Acids Res. 2015 Jan;43(Database issue):D315-20 15361882 - Nat Biotechnol. 2004 Oct;22(10):1302-6 17135200 - Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7 17324296 - BMC Bioinformatics. 2007 Feb 27;8:65 27043825 - PLoS One. 2016 Apr 04;11(4):e0152949 23262221 - FEBS Lett. 2013 Apr 17;587(8):1089-95 18552127 - Protein Sci. 2008 Sep;17(9):1617-23 20154676 - Nat Methods. 2010 Mar;7(3):237-42 10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42 16409147 - Annu Rev Med. 2006;57:223-41 17720750 - Protein Eng Des Sel. 2007 Oct;20(10):521-3 21789769 - J Pharm Sci. 2011 Dec;100(12):5081-95 15134647 - Biochim Biophys Acta. 2004 May 6;1698(2):131-53 25600945 - Bioinformatics. 2015 May 15;31(10):1698-700
References_xml	– volume: 26 start-page: 326 year: 2010 end-page: 332 article-title: FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence publication-title: Bioinformatics – volume: 35 start-page: D291 year: 2007 end-page: D297 article-title: The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution publication-title: Nucleic Acids Res – volume: 24 start-page: 179 year: 2006 end-page: 185 article-title: Protein quality in bacterial inclusion bodies publication-title: Trends Biotechnol – volume: 39 start-page: 203 year: 2002 end-page: 215 article-title: Beyond the proteasome: trimming, degradation and generation of MHC class I ligands by auxiliary proteases publication-title: Mol Immunol – volume: 7 start-page: 237 year: 2010 end-page: 242 article-title: Exploring the sequence determinants of amyloid structure using position‐specific scoring matrices publication-title: Nat Methods – volume: 11 start-page: e0152949 year: 2016 article-title: CPAD, Curated Protein Aggregation Database: a repository of manually curated experimental data on protein and peptide aggregation publication-title: PLoS One – volume: 42 start-page: W301 year: 2014 end-page: W307 article-title: PASTA 2.0: an improved server for protein aggregation prediction publication-title: Nucleic Acids Res – volume: 819 start-page: 199 year: 2012 end-page: 220 article-title: AGGRESCAN: method, application, and perspectives for drug design publication-title: Methods Mol Biol – volume: 278 start-page: 43717 year: 2003 end-page: 43727 article-title: Architecture of Ure2p prion filaments: the N‐terminal domains form a central core fiber publication-title: J Biol Chem – volume: 43 start-page: D315 year: 2015 end-page: D320 article-title: MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins publication-title: Nucleic Acids Res – volume: 47 start-page: 34 year: 2017 end-page: 42 article-title: Cell adaptation upon stress: the emerging role of membrane‐less compartments publication-title: Curr Opin Cell Biol – volume: 8 start-page: 345 year: 2007 article-title: Benchmarking consensus model quality assessment for protein fold recognition publication-title: BMC Bioinformatics – volume: 40 start-page: 1032 year: 2012 end-page: 1037 article-title: Evolutionary selection for protein aggregation publication-title: Biochem Soc Trans – volume: 31 start-page: 3395 year: 2015 end-page: 3397 article-title: AmyLoad: website dedicated to amyloidogenic protein fragments publication-title: Bioinformatics – volume: 31 start-page: 1698 year: 2015 end-page: 1700 article-title: WALTZ‐DB: a benchmark database of amyloidogenic hexapeptides publication-title: Bioinformatics – volume: DBiol Crystallogr58 start-page: 899 year: 2002 end-page: 907 article-title: The Protein Data Bank publication-title: Acta Crystallogr – volume: 2 start-page: e170 year: 2006 article-title: Insight into the structure of amyloid fibrils from the analysis of globular proteins publication-title: PLoS Comput Biol – volume: 1698 start-page: 131 year: 2004 end-page: 153 article-title: Conformational constraints for amyloid fibrillation: the importance of being unfolded publication-title: Biochim Biophys Acta – volume: 33 start-page: 2302 year: 2005 end-page: 2309 article-title: TM‐align: a protein structure alignment algorithm based on the TM‐score publication-title: Nucleic Acids Res – volume: 421 start-page: 427 year: 2012 end-page: 440 article-title: Oligomeric intermediates in amyloid formation: structure determination and mechanisms of toxicity publication-title: J Mol Biol – volume: 65 start-page: 910 year: 2008 end-page: 927 article-title: We find them here, we find them there: functional bacterial amyloid publication-title: Cell Mol Life Sci – volume: 11 start-page: 681 year: 2015 end-page: 690 article-title: A structure‐based approach to predict predisposition to amyloidosis publication-title: Alzheimers Dement – volume: 20 start-page: 521 year: 2007 end-page: 523 article-title: The PASTA server for protein aggregation prediction publication-title: Protein Eng Des Sel – volume: 57 start-page: 223 year: 2006 end-page: 241 article-title: Amyloidosis publication-title: Annu Rev Med – volume: 103 start-page: 4074 year: 2006 end-page: 4078 article-title: The 3D profile method for identifying fibril‐forming segments of proteins publication-title: Proc Natl Acad Sci USA – volume: 587 start-page: 1089 year: 2013 end-page: 1095 article-title: Breaking the amyloidogenicity code: methods to predict amyloids from amino acid sequence publication-title: FEBS Lett – volume: 22 start-page: 1658 year: 2006 end-page: 1659 article-title: Cd‐hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences publication-title: Bioinformatics – volume: 22 start-page: 1302 year: 2004 end-page: 1306 article-title: Prediction of sequence‐dependent and mutational effects on the aggregation of peptides and proteins publication-title: Nat Biotechnol – volume: 30 start-page: 1983 year: 2014 end-page: 1990 article-title: GAP: towards almost 100 percent prediction for beta‐strand‐mediated aggregating peptides with distinct morphologies publication-title: Bioinformatics – volume: 17 start-page: 1617 year: 2008 end-page: 1623 article-title: The structure of a fibril‐forming sequence, NNQQNY, in the context of a globular fold publication-title: Protein Sci – volume: 26 start-page: 182 year: 2010 end-page: 188 article-title: Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments publication-title: Bioinformatics – volume: 101 start-page: 7885 year: 2004 end-page: 7890 article-title: A model for Ure2p prion filaments and other amyloids: the parallel superpleated beta‐structure publication-title: Proc Natl Acad Sci USA – volume: 1848 start-page: 1 year: 2015 end-page: 7 article-title: Mechanism for transforming cytosolic SOD1 into integral membrane proteins of organelles by ALS‐causing mutations publication-title: Biochim Biophys Acta – volume: 100 start-page: 5081 year: 2011 end-page: 5095 article-title: Aggregation in protein‐based biotherapeutics: computational studies and tools to identify aggregation‐prone regions publication-title: J Pharm Sci – volume: 215 start-page: 403 year: 1990 end-page: 410 article-title: Basic local alignment search tool publication-title: J Mol Biol – volume: 102 start-page: 16672 year: 2005 end-page: 16677 article-title: The amyloid stretch hypothesis: recruiting proteins toward the dark side publication-title: Proc Natl Acad Sci USA – volume: 28 start-page: 235 year: 2000 end-page: 242 article-title: The Protein Data Bank publication-title: Nucleic Acids Res – volume: 356 start-page: 133 year: 2001 end-page: 145 article-title: The structural basis of protein folding and its links with human disease publication-title: Philos Trans R Soc London B Biol Sci – volume: 8 start-page: 65 year: 2007 article-title: AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides publication-title: BMC Bioinformatics – volume: 24 start-page: 1311 year: 2010 end-page: 1319 article-title: Beta arcades: recurring motifs in naturally occurring and disease‐related amyloid fibrils publication-title: FASEB J – ident: e_1_2_3_4_1 doi: 10.1016/j.jmb.2012.01.006 – ident: e_1_2_3_6_1 doi: 10.1093/bioinformatics/btu167 – ident: e_1_2_3_39_1 doi: 10.1016/j.bbamem.2014.10.002 – ident: e_1_2_3_15_1 doi: 10.1038/nbt1012 – ident: e_1_2_3_22_1 doi: 10.1093/bioinformatics/btv027 – ident: e_1_2_3_35_1 doi: 10.1093/nar/gki524 – ident: e_1_2_3_16_1 doi: 10.1093/bioinformatics/btp691 – ident: e_1_2_3_23_1 doi: 10.1371/journal.pone.0152949 – ident: e_1_2_3_3_1 doi: 10.1098/rstb.2000.0758 – ident: e_1_2_3_36_1 doi: 10.1073/pnas.0402427101 – ident: e_1_2_3_41_1 doi: 10.1371/journal.pcbi.0020170 – ident: e_1_2_3_17_1 doi: 10.1093/protein/gzm042 – ident: e_1_2_3_2_1 doi: 10.1016/j.febslet.2012.12.006 – ident: e_1_2_3_20_1 doi: 10.1073/pnas.0511295103 – ident: e_1_2_3_10_1 doi: 10.1002/jps.22705 – ident: e_1_2_3_28_1 doi: 10.1073/pnas.0505905102 – ident: e_1_2_3_33_1 doi: 10.1186/1471-2105-8-345 – ident: e_1_2_3_29_1 doi: 10.1074/jbc.M306004200 – ident: e_1_2_3_7_1 doi: 10.1016/j.bbapap.2003.12.008 – ident: e_1_2_3_14_1 doi: 10.1007/978-1-61779-465-0_14 – ident: e_1_2_3_25_1 doi: 10.1096/fj.09-145979 – ident: e_1_2_3_9_1 doi: 10.1016/j.ceb.2017.02.006 – ident: e_1_2_3_18_1 doi: 10.1093/nar/gku399 – ident: e_1_2_3_32_1 doi: 10.1093/nar/gku982 – ident: e_1_2_3_19_1 doi: 10.1038/nmeth.1432 – ident: e_1_2_3_11_1 doi: 10.1016/j.tibtech.2006.02.007 – ident: e_1_2_3_24_1 doi: 10.1146/annurev.med.57.121304.131243 – ident: e_1_2_3_31_1 doi: 10.1093/nar/28.1.235 – ident: e_1_2_3_12_1 doi: 10.1016/j.jalz.2014.06.007 – ident: e_1_2_3_21_1 doi: 10.1093/bioinformatics/btv375 – ident: e_1_2_3_5_1 doi: 10.1007/s00018-007-7404-4 – volume: 58 start-page: 899 year: 2002 ident: e_1_2_3_30_1 article-title: The Protein Data Bank publication-title: Acta Crystallogr – ident: e_1_2_3_27_1 doi: 10.1110/ps.036368.108 – ident: e_1_2_3_37_1 doi: 10.1016/S0022-2836(05)80360-2 – ident: e_1_2_3_26_1 doi: 10.1016/S0161-5890(02)00102-5 – ident: e_1_2_3_34_1 doi: 10.1093/bioinformatics/btp629 – ident: e_1_2_3_8_1 doi: 10.1042/BST20120160 – ident: e_1_2_3_13_1 doi: 10.1186/1471-2105-8-65 – ident: e_1_2_3_40_1 doi: 10.1093/bioinformatics/btl158 – ident: e_1_2_3_38_1 doi: 10.1093/nar/gkl959 – reference: 17173479 - PLoS Comput Biol. 2006 Dec 15;2(12):e170 – reference: 18552127 - Protein Sci. 2008 Sep;17(9):1617-23 – reference: 22248587 - J Mol Biol. 2012 Aug 24;421(4-5):427-40 – reference: 16731699 - Bioinformatics. 2006 Jul 1;22(13):1658-9 – reference: 25150734 - Alzheimers Dement. 2015 Jun;11(6):681-90 – reference: 21789769 - J Pharm Sci. 2011 Dec;100(12):5081-95 – reference: 25361972 - Nucleic Acids Res. 2015 Jan;43(Database issue):D315-20 – reference: 23262221 - FEBS Lett. 2013 Apr 17;587(8):1089-95 – reference: 20019059 - Bioinformatics. 2010 Feb 1;26(3):326-32 – reference: 22988860 - Biochem Soc Trans. 2012 Oct;40(5):1032-7 – reference: 12917441 - J Biol Chem. 2003 Oct 31;278(44):43717-27 – reference: 26088800 - Bioinformatics. 2015 Oct 15;31(20):3395-7 – reference: 20032312 - FASEB J. 2010 May;24(5):1311-9 – reference: 24681906 - Bioinformatics. 2014 Jul 15;30(14):1983-90 – reference: 15134647 - Biochim Biophys Acta. 2004 May 6;1698(2):131-53 – reference: 16537487 - Proc Natl Acad Sci U S A. 2006 Mar 14;103(11):4074-8 – reference: 28342303 - Curr Opin Cell Biol. 2017 Aug;47:34-42 – reference: 15143215 - Proc Natl Acad Sci U S A. 2004 May 25;101(21):7885-90 – reference: 10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42 – reference: 24848016 - Nucleic Acids Res. 2014 Jul;42(Web Server issue):W301-7 – reference: 16409147 - Annu Rev Med. 2006;57:223-41 – reference: 25600945 - Bioinformatics. 2015 May 15;31(10):1698-700 – reference: 15849316 - Nucleic Acids Res. 2005 Apr 22;33(7):2302-9 – reference: 2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10 – reference: 12200051 - Mol Immunol. 2002 Oct;39(3-4):203-15 – reference: 11260793 - Philos Trans R Soc Lond B Biol Sci. 2001 Feb 28;356(1406):133-45 – reference: 18034321 - Cell Mol Life Sci. 2008 Mar;65(6):910-27 – reference: 25306968 - Biochim Biophys Acta. 2015 Jan;1848(1 Pt A):1-7 – reference: 15361882 - Nat Biotechnol. 2004 Oct;22(10):1302-6 – reference: 12037327 - Acta Crystallogr D Biol Crystallogr. 2002 Jun;58(Pt 6 No 1):899-907 – reference: 16263932 - Proc Natl Acad Sci U S A. 2005 Nov 15;102(46):16672-7 – reference: 17877795 - BMC Bioinformatics. 2007 Sep 18;8:345 – reference: 20154676 - Nat Methods. 2010 Mar;7(3):237-42 – reference: 17324296 - BMC Bioinformatics. 2007 Feb 27;8:65 – reference: 16503059 - Trends Biotechnol. 2006 Apr;24(4):179-85 – reference: 27043825 - PLoS One. 2016 Apr 04;11(4):e0152949 – reference: 17720750 - Protein Eng Des Sel. 2007 Oct;20(10):521-3 – reference: 22183539 - Methods Mol Biol. 2012;819:199-220 – reference: 17135200 - Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7 – reference: 19897565 - Bioinformatics. 2010 Jan 15;26(2):182-8
SSID	ssj0004123
Score	2.3010006
Snippet	There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous...
SourceID	unpaywall pubmedcentral proquest pubmed crossref wiley
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	1864
SubjectTerms	3D structure Agglomeration Aggregates aggregation Algorithms Amyloid Amyloid - analysis Amyloid - chemistry Amyloid - metabolism amyloid fibrils computational approaches Computer applications Cutting tools Databases, Protein Forming In vivo methods and tests Models, Statistical NMR Nuclear magnetic resonance Nuclear Magnetic Resonance, Biomolecular Predictions Proteins Sequences Solubility soluble
Title	Usage of a dataset of NMR resolved protein structures to test aggregation versus solubility prediction algorithms
URI	https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fpro.3225 https://www.ncbi.nlm.nih.gov/pubmed/28685932 https://www.proquest.com/docview/1930111699 https://www.proquest.com/docview/1917362238 https://pubmed.ncbi.nlm.nih.gov/PMC5563137 https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/pro.3225
UnpaywallVersion	publishedVersion
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVFSB databaseName: Free Full-Text Journals in Chemistry customDbUrl: eissn: 1469-896X dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0004123 issn: 0961-8368 databaseCode: HH5 dateStart: 19920101 isFulltext: true titleUrlDefault: http://abc-chemistry.org/ providerName: ABC ChemistRy – providerCode: PRVEBS databaseName: EBSCOhost Food Science Source customDbUrl: eissn: 1469-896X dateEnd: 20241102 omitProxy: false ssIdentifier: ssj0004123 issn: 0961-8368 databaseCode: A8Z dateStart: 20100101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=fsr providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1469-896X dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0004123 issn: 0961-8368 databaseCode: DIK dateStart: 19920101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1469-896X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004123 issn: 0961-8368 databaseCode: GX1 dateStart: 0 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1469-896X dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0004123 issn: 0961-8368 databaseCode: RPM dateStart: 19920101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLagexgvXDYuhTEZhMZTqiZxHOexmpgGaGWaqFSeItuxu4osKWsCKr-ec5yLKAOEeIoan6R28h3nO_bxZ0JembEKLePWE3IsPaYN8xTTQOTGliko00LiOOTZlJ_O2Lt5NG8H3HAtTKMP0Q-4oWe4_hodfJXZpp9vZ_cDXK02QkTeJjs8Ai4-IDuz6fnkkxPY474nwmYtHMSAnkj4vFOf_enS7e_RDZJ5M1dyty5WcvNN5vk2n3UfpJN7JO2a0uShfB7VlRrp77-oPP5_W--Tuy1XpZMGXA_ILVPskf1JAXH61YYeUZc96obl98jucbdz3D75MsNsNVpaKilmoK5NhT-mZxcUgvsy_2oy6vQhlgVt9GtrOE-rkgLvrahcLK7NwiGGYtJIvaboIC6LdwMX4tSSK5T5orxeVpdX64dkdvLm4_Gp127t4GkWisiziTK-ihkPfMViy43OTJgAe80SnonEAomLbWYwXssCmZiIG6aiiGfa-tAnyfARGRRlYZ4QqsVYWKu1ijJgIyKEkIFnjGudWGPh3JC87l5wqlvdc9x-I08bxeYghRan-GSH5EVvuWq0Pn5jc9BhJG29fZ0CCYZu0udJArfoi-Gh4-SLLExZow1UnAMZE0PyuIFU_yeB4Cg7B1WNt8DWG6AG-HZJsbx0WuCo7-aH8ZC87GH5l7ofOZD90SA9v_iAx6f_crdn5E6AFMfl2x2QAcDFPAeCVqlDCE3evj9sffEHd0RAVA
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLagexgvXDYuhYEMQuMpVZM4jvNYTUwT0so0Uak8RY4vXbUsKWuyqfx6znEuogwQ4ilqfJLayXec79jHnwl5b8ZZaBm3npBj6TFlmJcxBURubFkGZUpIHIc8nfKTGfs0j-btgBuuhWn0IfoBN_QM11-jg6-0bfr5dnY_wNVqI0TkfbLDI-DiA7Izm55NvjqBPe57ImzWwkEM6ImEzzv12Z8u3f4e3SGZd3Mld-tiJTe3Ms-3-az7IB0_ImnXlCYP5XJUV9lIff9F5fH_2_qYPGy5Kp004HpC7plij-xPCojTrzb0kLrsUTcsv0d2j7qd4_bJtxlmq9HSUkkxA3VtKvwxPT2nENyX-Y3R1OlDLAva6NfWcJ5WJQXeW1G5WFybhUMMxaSRek3RQVwW7wYuxKklVyjzRXm9rC6u1k_J7Pjjl6MTr93awVMsFJFnk8z4Wcx44GcsttwobcIE2KtOuBaJBRIXW20wXtOBTEzEDcuiiGtlfeiTZPiMDIqyMC8IVWIsrFUqizSwERFCyMA140ol1lg4NyQfuhecqlb3HLffyNNGsTlIocUpPtkhedtbrhqtj9_YHHQYSVtvX6dAgqGb9HmSwC36YnjoOPkiC1PWaAMV50DGxJA8byDV_0kgOMrOQVXjLbD1BqgBvl1SLC-cFjjqu_lhPCTvelj-pe6HDmR_NEjPzj_j8eW_3O0VeRAgxXH5dgdkAHAxr4GgVdmb1gt_APrzP1s
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Usage+of+a+dataset+of+NMR+resolved+protein+structures+to+test+aggregation+versus+solubility+prediction+algorithms&rft.jtitle=Protein+science&rft.au=Roche%2C+Daniel+B&rft.au=Villain%2C+Etienne&rft.au=Kajava%2C+Andrey+V&rft.date=2017-09-01&rft.eissn=1469-896X&rft.volume=26&rft.issue=9&rft.spage=1864&rft_id=info:doi/10.1002%2Fpro.3225&rft_id=info%3Apmid%2F28685932&rft.externalDocID=28685932
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0961-8368&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0961-8368&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0961-8368&client=summon