Usage of a dataset of NMR resolved protein structures to test aggregation versus solubility prediction algorithms

There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggr...

Full description

Saved in:
Bibliographic Details
Published inProtein science Vol. 26; no. 9; pp. 1864 - 1869
Main Authors Roche, Daniel B., Villain, Etienne, Kajava, Andrey V.
Format Journal Article
LanguageEnglish
Published United States Wiley Subscription Services, Inc 01.09.2017
John Wiley and Sons Inc
Subjects
Online AccessGet full text
ISSN0961-8368
1469-896X
1469-896X
DOI10.1002/pro.3225

Cover

Abstract There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.
AbstractList There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation-prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non-forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR-resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation-prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non-forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR-resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.
There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo , are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo . We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.
There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.
Author Kajava, Andrey V.
Villain, Etienne
Roche, Daniel B.
AuthorAffiliation 3 University ITMO, 49 Kronverksky Pr, 197101, St. Petersburg, Russia
2 Institut de Biologie Computationnelle, Université de Montpellier Montpellier France
1 Centre de Recherche en Biologie cellulaire de Montpellier, CNRS‐UMR 5237 Montpellier France
AuthorAffiliation_xml – name: 3 University ITMO, 49 Kronverksky Pr, 197101, St. Petersburg, Russia
– name: 1 Centre de Recherche en Biologie cellulaire de Montpellier, CNRS‐UMR 5237 Montpellier France
– name: 2 Institut de Biologie Computationnelle, Université de Montpellier Montpellier France
Author_xml – sequence: 1
  givenname: Daniel B.
  orcidid: 0000-0002-9204-1840
  surname: Roche
  fullname: Roche, Daniel B.
  email: daniel.roche@crbm.cnrs.fr
  organization: Institut de Biologie Computationnelle, Université de Montpellier
– sequence: 2
  givenname: Etienne
  surname: Villain
  fullname: Villain, Etienne
  organization: Institut de Biologie Computationnelle, Université de Montpellier
– sequence: 3
  givenname: Andrey V.
  surname: Kajava
  fullname: Kajava, Andrey V.
  email: andrey.kajava@crbm.cnrs.fr
  organization: University ITMO, 49 Kronverksky Pr, 197101, St. Petersburg, Russia
BackLink https://www.ncbi.nlm.nih.gov/pubmed/28685932$$D View this record in MEDLINE/PubMed
BookMark eNp9kdtq3DAQhkVJaTZpoU9QBL1pL7zRwZatm0IIPUGalNBA74RsjR0FrbWR5A379tXu5tCGtldiNN_8M_PPAdob_QgIvaZkTglhR8vg55yx6hma0VLIopHi5x6aESlo0XDR7KODGK8JISVl_AXaZ41oKsnZDN1cRj0A9j3W2OikI6RNcPbtAgeI3q3A4KyewI44pjB1acr_OHmcICashyHAoJP1I15BiFPEuWhqrbNpnQvB2G6b1G7wwaarRXyJnvfaRXh19x6iy08ff5x8KU7PP389OT4tupI3VdHLFmhbl4LRtqx7AZ0BLklDjBSmkX3Jq7o3QDihhmkJlYCyrSphup7WnGh-iN7vdKdxqde32jm1DHahw1pRoja25dirjW2Z_bBjl1O7ANPBmIJ-5L226s_MaK_U4FcqN-SU11ng3Z1A8DdTdkYtbOzAOT2Cn6KiMg8lGONNRt8-Qa_9FMZsRabyNpQKKTP15veJHka5v9xjxy74GAP0_9tu_gTtbNreLO9i3d8Kil3BrXWw_qew-n5xvuV_ARXiybA
CitedBy_id crossref_primary_10_3390_ijms21145038
crossref_primary_10_1016_j_jsb_2017_09_006
crossref_primary_10_1371_journal_pone_0193726
crossref_primary_10_3390_ijms24108571
crossref_primary_10_1016_j_molliq_2020_113618
crossref_primary_10_1093_bioinformatics_btx629
crossref_primary_10_1093_femsre_fuy038
crossref_primary_10_3389_fnmol_2019_00274
crossref_primary_10_1093_nar_gkz758
crossref_primary_10_3390_biomedicines9101451
crossref_primary_10_3390_ijms23169102
crossref_primary_10_3390_ijms22095002
Cites_doi 10.1016/j.jmb.2012.01.006
10.1093/bioinformatics/btu167
10.1016/j.bbamem.2014.10.002
10.1038/nbt1012
10.1093/bioinformatics/btv027
10.1093/nar/gki524
10.1093/bioinformatics/btp691
10.1371/journal.pone.0152949
10.1098/rstb.2000.0758
10.1073/pnas.0402427101
10.1371/journal.pcbi.0020170
10.1093/protein/gzm042
10.1016/j.febslet.2012.12.006
10.1073/pnas.0511295103
10.1002/jps.22705
10.1073/pnas.0505905102
10.1186/1471-2105-8-345
10.1074/jbc.M306004200
10.1016/j.bbapap.2003.12.008
10.1007/978-1-61779-465-0_14
10.1096/fj.09-145979
10.1016/j.ceb.2017.02.006
10.1093/nar/gku399
10.1093/nar/gku982
10.1038/nmeth.1432
10.1016/j.tibtech.2006.02.007
10.1146/annurev.med.57.121304.131243
10.1093/nar/28.1.235
10.1016/j.jalz.2014.06.007
10.1093/bioinformatics/btv375
10.1007/s00018-007-7404-4
10.1110/ps.036368.108
10.1016/S0022-2836(05)80360-2
10.1016/S0161-5890(02)00102-5
10.1093/bioinformatics/btp629
10.1042/BST20120160
10.1186/1471-2105-8-65
10.1093/bioinformatics/btl158
10.1093/nar/gkl959
ContentType Journal Article
Copyright 2017 The Protein Society
2017 The Protein Society.
Copyright_xml – notice: 2017 The Protein Society
– notice: 2017 The Protein Society.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7T5
7TM
7U9
8FD
FR3
H94
K9.
P64
RC3
7X8
5PM
ADTOC
UNPAY
DOI 10.1002/pro.3225
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Immunology Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Technology Research Database
Engineering Research Database
AIDS and Cancer Research Abstracts
ProQuest Health & Medical Complete (Alumni)
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Virology and AIDS Abstracts
Biotechnology Research Abstracts
Technology Research Database
Nucleic Acids Abstracts
AIDS and Cancer Research Abstracts
ProQuest Health & Medical Complete (Alumni)
Immunology Abstracts
Engineering Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
CrossRef

Genetics Abstracts

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
DocumentTitleAlternate Prediction of Aggregation Versus Solubility Propensity
EISSN 1469-896X
EndPage 1869
ExternalDocumentID 10.1002/pro.3225
PMC5563137
28685932
10_1002_pro_3225
PRO3225
Genre article
Journal Article
GrantInformation_xml – fundername: Ministère de l'Éducation nationale, de l'Enseignement supérieur et de la Recherche (MEESR)
– fundername: Institut de Biologie Computationnelle, Université de Montpellier (ANR Investissements D'Avenir Bio‐informatique: projet IBC)
– fundername: COST Action BM1405 (Non‐Globular Protein Network)
GroupedDBID ---
.GJ
05W
0R~
123
1L6
1OC
29P
2WC
31~
33P
3SF
3WU
4.4
52U
53G
5RE
6TJ
8-0
8-1
8UM
A8Z
AAESR
AAEVG
AAHQN
AAIHA
AAMMB
AAMNL
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAZKR
ABCUV
ABGDZ
ABLJU
ACAHQ
ACCZN
ACFBH
ACGFO
ACGFS
ACIWK
ACPOU
ACPRK
ACQPF
ACXBN
ACXQS
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADOZA
ADXAS
ADZMN
AEFGJ
AEIGN
AEIMD
AENEX
AEUYR
AEYWJ
AFBPY
AFFNX
AFFPM
AFGKR
AFRAH
AFWVQ
AFZJQ
AGHNM
AGXDD
AGYGG
AHBTC
AHMBA
AIAGR
AIDQK
AIDYY
AITYG
AIURR
AJXKR
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMYDB
AOIJS
ATUGU
AUFTA
AZVAB
BFHJK
BHBCM
BMNLL
BMXJE
BNHUX
BOGZA
BRXPI
C1A
C45
CAG
COF
CS3
DCZOG
DIK
DRFUL
DRSTM
DU5
E3Z
EBD
EBS
EJD
EMOBN
ESTFP
F5P
G-S
GODZA
GX1
HGLYW
HH5
HYE
HZ~
IH2
LATKE
LEEKS
LH4
LITHE
LOXES
LUTES
LYRES
MEWTI
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
MY~
NNB
O66
O9-
OIG
OK1
OVD
P2P
P2W
PQQKQ
QRW
RCA
ROL
RPM
SJN
SUPJJ
SV3
TEORI
TR2
WBKPD
WIH
WIK
WIN
WNSPC
WOHZO
WOQ
WXSBR
WYISQ
XV2
Y6R
YKV
ZGI
ZXP
ZZTAW
~02
~S-
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
RIG
7QO
7T5
7TM
7U9
8FD
FR3
H94
K9.
P64
RC3
7X8
5PM
ADTOC
UNPAY
ID FETCH-LOGICAL-c4385-f9be1b74621b47f6ecde39080d96d89f4357fde0301d2a9e56e4b556dcf1730a3
IEDL.DBID UNPAY
ISSN 0961-8368
1469-896X
IngestDate Sun Oct 26 04:11:27 EDT 2025
Tue Sep 30 15:47:56 EDT 2025
Mon Sep 08 10:32:14 EDT 2025
Tue Oct 07 06:35:51 EDT 2025
Mon Jul 21 05:51:35 EDT 2025
Wed Oct 01 00:44:03 EDT 2025
Thu Apr 24 22:57:20 EDT 2025
Sun Sep 21 06:23:49 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords NMR
database
computational approaches
amyloid fibrils
soluble
3D structure
aggregation
Language English
License http://onlinelibrary.wiley.com/termsAndConditions#vor
2017 The Protein Society.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4385-f9be1b74621b47f6ecde39080d96d89f4357fde0301d2a9e56e4b556dcf1730a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-9204-1840
OpenAccessLink https://proxy.k.utb.cz/login?url=https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/pro.3225
PMID 28685932
PQID 1930111699
PQPubID 1016442
PageCount 6
ParticipantIDs unpaywall_primary_10_1002_pro_3225
pubmedcentral_primary_oai_pubmedcentral_nih_gov_5563137
proquest_miscellaneous_1917362238
proquest_journals_1930111699
pubmed_primary_28685932
crossref_primary_10_1002_pro_3225
crossref_citationtrail_10_1002_pro_3225
wiley_primary_10_1002_pro_3225_PRO3225
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate September 2017
PublicationDateYYYYMMDD 2017-09-01
PublicationDate_xml – month: 09
  year: 2017
  text: September 2017
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Bethesda
– name: Hoboken
PublicationTitle Protein science
PublicationTitleAlternate Protein Sci
PublicationYear 2017
Publisher Wiley Subscription Services, Inc
John Wiley and Sons Inc
Publisher_xml – name: Wiley Subscription Services, Inc
– name: John Wiley and Sons Inc
References 2004; 101
2004; 22
2002; 39
2012; 421
2000; 28
2017; 47
2006; 57
2015; 31
2015; 11
2008; 17
2013; 587
2002; DBiol Crystallogr58
2006; 2
2003; 278
2007; 35
2014; 42
2016; 11
1990; 215
2010; 26
2010; 24
2006; 24
2015; 1848
2005; 102
2006; 22
2015; 43
2007; 8
2008; 65
2012; 819
2014; 30
2007; 20
2004; 1698
2010; 7
2005; 33
2001; 356
2011; 100
2006; 103
2012; 40
e_1_2_3_2_1
e_1_2_3_6_1
e_1_2_3_16_1
e_1_2_3_39_1
e_1_2_3_5_1
e_1_2_3_17_1
e_1_2_3_38_1
e_1_2_3_4_1
e_1_2_3_18_1
e_1_2_3_3_1
e_1_2_3_19_1
e_1_2_3_12_1
e_1_2_3_35_1
e_1_2_3_9_1
e_1_2_3_13_1
e_1_2_3_34_1
e_1_2_3_8_1
e_1_2_3_14_1
e_1_2_3_37_1
e_1_2_3_7_1
e_1_2_3_15_1
e_1_2_3_36_1
e_1_2_3_31_1
e_1_2_3_10_1
e_1_2_3_33_1
e_1_2_3_11_1
e_1_2_3_32_1
e_1_2_3_40_1
Berman HM (e_1_2_3_30_1) 2002; 58
e_1_2_3_27_1
e_1_2_3_28_1
e_1_2_3_29_1
e_1_2_3_23_1
e_1_2_3_24_1
e_1_2_3_25_1
e_1_2_3_26_1
e_1_2_3_20_1
e_1_2_3_41_1
e_1_2_3_21_1
e_1_2_3_22_1
16537487 - Proc Natl Acad Sci U S A. 2006 Mar 14;103(11):4074-8
12037327 - Acta Crystallogr D Biol Crystallogr. 2002 Jun;58(Pt 6 No 1):899-907
12200051 - Mol Immunol. 2002 Oct;39(3-4):203-15
22248587 - J Mol Biol. 2012 Aug 24;421(4-5):427-40
22988860 - Biochem Soc Trans. 2012 Oct;40(5):1032-7
2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10
15143215 - Proc Natl Acad Sci U S A. 2004 May 25;101(21):7885-90
22183539 - Methods Mol Biol. 2012;819:199-220
15849316 - Nucleic Acids Res. 2005 Apr 22;33(7):2302-9
16731699 - Bioinformatics. 2006 Jul 1;22(13):1658-9
11260793 - Philos Trans R Soc Lond B Biol Sci. 2001 Feb 28;356(1406):133-45
25150734 - Alzheimers Dement. 2015 Jun;11(6):681-90
25306968 - Biochim Biophys Acta. 2015 Jan;1848(1 Pt A):1-7
17173479 - PLoS Comput Biol. 2006 Dec 15;2(12):e170
17877795 - BMC Bioinformatics. 2007 Sep 18;8:345
24681906 - Bioinformatics. 2014 Jul 15;30(14):1983-90
12917441 - J Biol Chem. 2003 Oct 31;278(44):43717-27
18034321 - Cell Mol Life Sci. 2008 Mar;65(6):910-27
16263932 - Proc Natl Acad Sci U S A. 2005 Nov 15;102(46):16672-7
20019059 - Bioinformatics. 2010 Feb 1;26(3):326-32
24848016 - Nucleic Acids Res. 2014 Jul;42(Web Server issue):W301-7
20032312 - FASEB J. 2010 May;24(5):1311-9
26088800 - Bioinformatics. 2015 Oct 15;31(20):3395-7
19897565 - Bioinformatics. 2010 Jan 15;26(2):182-8
16503059 - Trends Biotechnol. 2006 Apr;24(4):179-85
28342303 - Curr Opin Cell Biol. 2017 Aug;47:34-42
25361972 - Nucleic Acids Res. 2015 Jan;43(Database issue):D315-20
15361882 - Nat Biotechnol. 2004 Oct;22(10):1302-6
17135200 - Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7
17324296 - BMC Bioinformatics. 2007 Feb 27;8:65
27043825 - PLoS One. 2016 Apr 04;11(4):e0152949
23262221 - FEBS Lett. 2013 Apr 17;587(8):1089-95
18552127 - Protein Sci. 2008 Sep;17(9):1617-23
20154676 - Nat Methods. 2010 Mar;7(3):237-42
10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42
16409147 - Annu Rev Med. 2006;57:223-41
17720750 - Protein Eng Des Sel. 2007 Oct;20(10):521-3
21789769 - J Pharm Sci. 2011 Dec;100(12):5081-95
15134647 - Biochim Biophys Acta. 2004 May 6;1698(2):131-53
25600945 - Bioinformatics. 2015 May 15;31(10):1698-700
References_xml – volume: 26
  start-page: 326
  year: 2010
  end-page: 332
  article-title: FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence
  publication-title: Bioinformatics
– volume: 35
  start-page: D291
  year: 2007
  end-page: D297
  article-title: The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
  publication-title: Nucleic Acids Res
– volume: 24
  start-page: 179
  year: 2006
  end-page: 185
  article-title: Protein quality in bacterial inclusion bodies
  publication-title: Trends Biotechnol
– volume: 39
  start-page: 203
  year: 2002
  end-page: 215
  article-title: Beyond the proteasome: trimming, degradation and generation of MHC class I ligands by auxiliary proteases
  publication-title: Mol Immunol
– volume: 7
  start-page: 237
  year: 2010
  end-page: 242
  article-title: Exploring the sequence determinants of amyloid structure using position‐specific scoring matrices
  publication-title: Nat Methods
– volume: 11
  start-page: e0152949
  year: 2016
  article-title: CPAD, Curated Protein Aggregation Database: a repository of manually curated experimental data on protein and peptide aggregation
  publication-title: PLoS One
– volume: 42
  start-page: W301
  year: 2014
  end-page: W307
  article-title: PASTA 2.0: an improved server for protein aggregation prediction
  publication-title: Nucleic Acids Res
– volume: 819
  start-page: 199
  year: 2012
  end-page: 220
  article-title: AGGRESCAN: method, application, and perspectives for drug design
  publication-title: Methods Mol Biol
– volume: 278
  start-page: 43717
  year: 2003
  end-page: 43727
  article-title: Architecture of Ure2p prion filaments: the N‐terminal domains form a central core fiber
  publication-title: J Biol Chem
– volume: 43
  start-page: D315
  year: 2015
  end-page: D320
  article-title: MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins
  publication-title: Nucleic Acids Res
– volume: 47
  start-page: 34
  year: 2017
  end-page: 42
  article-title: Cell adaptation upon stress: the emerging role of membrane‐less compartments
  publication-title: Curr Opin Cell Biol
– volume: 8
  start-page: 345
  year: 2007
  article-title: Benchmarking consensus model quality assessment for protein fold recognition
  publication-title: BMC Bioinformatics
– volume: 40
  start-page: 1032
  year: 2012
  end-page: 1037
  article-title: Evolutionary selection for protein aggregation
  publication-title: Biochem Soc Trans
– volume: 31
  start-page: 3395
  year: 2015
  end-page: 3397
  article-title: AmyLoad: website dedicated to amyloidogenic protein fragments
  publication-title: Bioinformatics
– volume: 31
  start-page: 1698
  year: 2015
  end-page: 1700
  article-title: WALTZ‐DB: a benchmark database of amyloidogenic hexapeptides
  publication-title: Bioinformatics
– volume: DBiol Crystallogr58
  start-page: 899
  year: 2002
  end-page: 907
  article-title: The Protein Data Bank
  publication-title: Acta Crystallogr
– volume: 2
  start-page: e170
  year: 2006
  article-title: Insight into the structure of amyloid fibrils from the analysis of globular proteins
  publication-title: PLoS Comput Biol
– volume: 1698
  start-page: 131
  year: 2004
  end-page: 153
  article-title: Conformational constraints for amyloid fibrillation: the importance of being unfolded
  publication-title: Biochim Biophys Acta
– volume: 33
  start-page: 2302
  year: 2005
  end-page: 2309
  article-title: TM‐align: a protein structure alignment algorithm based on the TM‐score
  publication-title: Nucleic Acids Res
– volume: 421
  start-page: 427
  year: 2012
  end-page: 440
  article-title: Oligomeric intermediates in amyloid formation: structure determination and mechanisms of toxicity
  publication-title: J Mol Biol
– volume: 65
  start-page: 910
  year: 2008
  end-page: 927
  article-title: We find them here, we find them there: functional bacterial amyloid
  publication-title: Cell Mol Life Sci
– volume: 11
  start-page: 681
  year: 2015
  end-page: 690
  article-title: A structure‐based approach to predict predisposition to amyloidosis
  publication-title: Alzheimers Dement
– volume: 20
  start-page: 521
  year: 2007
  end-page: 523
  article-title: The PASTA server for protein aggregation prediction
  publication-title: Protein Eng Des Sel
– volume: 57
  start-page: 223
  year: 2006
  end-page: 241
  article-title: Amyloidosis
  publication-title: Annu Rev Med
– volume: 103
  start-page: 4074
  year: 2006
  end-page: 4078
  article-title: The 3D profile method for identifying fibril‐forming segments of proteins
  publication-title: Proc Natl Acad Sci USA
– volume: 587
  start-page: 1089
  year: 2013
  end-page: 1095
  article-title: Breaking the amyloidogenicity code: methods to predict amyloids from amino acid sequence
  publication-title: FEBS Lett
– volume: 22
  start-page: 1658
  year: 2006
  end-page: 1659
  article-title: Cd‐hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
  publication-title: Bioinformatics
– volume: 22
  start-page: 1302
  year: 2004
  end-page: 1306
  article-title: Prediction of sequence‐dependent and mutational effects on the aggregation of peptides and proteins
  publication-title: Nat Biotechnol
– volume: 30
  start-page: 1983
  year: 2014
  end-page: 1990
  article-title: GAP: towards almost 100 percent prediction for beta‐strand‐mediated aggregating peptides with distinct morphologies
  publication-title: Bioinformatics
– volume: 17
  start-page: 1617
  year: 2008
  end-page: 1623
  article-title: The structure of a fibril‐forming sequence, NNQQNY, in the context of a globular fold
  publication-title: Protein Sci
– volume: 26
  start-page: 182
  year: 2010
  end-page: 188
  article-title: Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments
  publication-title: Bioinformatics
– volume: 101
  start-page: 7885
  year: 2004
  end-page: 7890
  article-title: A model for Ure2p prion filaments and other amyloids: the parallel superpleated beta‐structure
  publication-title: Proc Natl Acad Sci USA
– volume: 1848
  start-page: 1
  year: 2015
  end-page: 7
  article-title: Mechanism for transforming cytosolic SOD1 into integral membrane proteins of organelles by ALS‐causing mutations
  publication-title: Biochim Biophys Acta
– volume: 100
  start-page: 5081
  year: 2011
  end-page: 5095
  article-title: Aggregation in protein‐based biotherapeutics: computational studies and tools to identify aggregation‐prone regions
  publication-title: J Pharm Sci
– volume: 215
  start-page: 403
  year: 1990
  end-page: 410
  article-title: Basic local alignment search tool
  publication-title: J Mol Biol
– volume: 102
  start-page: 16672
  year: 2005
  end-page: 16677
  article-title: The amyloid stretch hypothesis: recruiting proteins toward the dark side
  publication-title: Proc Natl Acad Sci USA
– volume: 28
  start-page: 235
  year: 2000
  end-page: 242
  article-title: The Protein Data Bank
  publication-title: Nucleic Acids Res
– volume: 356
  start-page: 133
  year: 2001
  end-page: 145
  article-title: The structural basis of protein folding and its links with human disease
  publication-title: Philos Trans R Soc London B Biol Sci
– volume: 8
  start-page: 65
  year: 2007
  article-title: AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides
  publication-title: BMC Bioinformatics
– volume: 24
  start-page: 1311
  year: 2010
  end-page: 1319
  article-title: Beta arcades: recurring motifs in naturally occurring and disease‐related amyloid fibrils
  publication-title: FASEB J
– ident: e_1_2_3_4_1
  doi: 10.1016/j.jmb.2012.01.006
– ident: e_1_2_3_6_1
  doi: 10.1093/bioinformatics/btu167
– ident: e_1_2_3_39_1
  doi: 10.1016/j.bbamem.2014.10.002
– ident: e_1_2_3_15_1
  doi: 10.1038/nbt1012
– ident: e_1_2_3_22_1
  doi: 10.1093/bioinformatics/btv027
– ident: e_1_2_3_35_1
  doi: 10.1093/nar/gki524
– ident: e_1_2_3_16_1
  doi: 10.1093/bioinformatics/btp691
– ident: e_1_2_3_23_1
  doi: 10.1371/journal.pone.0152949
– ident: e_1_2_3_3_1
  doi: 10.1098/rstb.2000.0758
– ident: e_1_2_3_36_1
  doi: 10.1073/pnas.0402427101
– ident: e_1_2_3_41_1
  doi: 10.1371/journal.pcbi.0020170
– ident: e_1_2_3_17_1
  doi: 10.1093/protein/gzm042
– ident: e_1_2_3_2_1
  doi: 10.1016/j.febslet.2012.12.006
– ident: e_1_2_3_20_1
  doi: 10.1073/pnas.0511295103
– ident: e_1_2_3_10_1
  doi: 10.1002/jps.22705
– ident: e_1_2_3_28_1
  doi: 10.1073/pnas.0505905102
– ident: e_1_2_3_33_1
  doi: 10.1186/1471-2105-8-345
– ident: e_1_2_3_29_1
  doi: 10.1074/jbc.M306004200
– ident: e_1_2_3_7_1
  doi: 10.1016/j.bbapap.2003.12.008
– ident: e_1_2_3_14_1
  doi: 10.1007/978-1-61779-465-0_14
– ident: e_1_2_3_25_1
  doi: 10.1096/fj.09-145979
– ident: e_1_2_3_9_1
  doi: 10.1016/j.ceb.2017.02.006
– ident: e_1_2_3_18_1
  doi: 10.1093/nar/gku399
– ident: e_1_2_3_32_1
  doi: 10.1093/nar/gku982
– ident: e_1_2_3_19_1
  doi: 10.1038/nmeth.1432
– ident: e_1_2_3_11_1
  doi: 10.1016/j.tibtech.2006.02.007
– ident: e_1_2_3_24_1
  doi: 10.1146/annurev.med.57.121304.131243
– ident: e_1_2_3_31_1
  doi: 10.1093/nar/28.1.235
– ident: e_1_2_3_12_1
  doi: 10.1016/j.jalz.2014.06.007
– ident: e_1_2_3_21_1
  doi: 10.1093/bioinformatics/btv375
– ident: e_1_2_3_5_1
  doi: 10.1007/s00018-007-7404-4
– volume: 58
  start-page: 899
  year: 2002
  ident: e_1_2_3_30_1
  article-title: The Protein Data Bank
  publication-title: Acta Crystallogr
– ident: e_1_2_3_27_1
  doi: 10.1110/ps.036368.108
– ident: e_1_2_3_37_1
  doi: 10.1016/S0022-2836(05)80360-2
– ident: e_1_2_3_26_1
  doi: 10.1016/S0161-5890(02)00102-5
– ident: e_1_2_3_34_1
  doi: 10.1093/bioinformatics/btp629
– ident: e_1_2_3_8_1
  doi: 10.1042/BST20120160
– ident: e_1_2_3_13_1
  doi: 10.1186/1471-2105-8-65
– ident: e_1_2_3_40_1
  doi: 10.1093/bioinformatics/btl158
– ident: e_1_2_3_38_1
  doi: 10.1093/nar/gkl959
– reference: 17173479 - PLoS Comput Biol. 2006 Dec 15;2(12):e170
– reference: 18552127 - Protein Sci. 2008 Sep;17(9):1617-23
– reference: 22248587 - J Mol Biol. 2012 Aug 24;421(4-5):427-40
– reference: 16731699 - Bioinformatics. 2006 Jul 1;22(13):1658-9
– reference: 25150734 - Alzheimers Dement. 2015 Jun;11(6):681-90
– reference: 21789769 - J Pharm Sci. 2011 Dec;100(12):5081-95
– reference: 25361972 - Nucleic Acids Res. 2015 Jan;43(Database issue):D315-20
– reference: 23262221 - FEBS Lett. 2013 Apr 17;587(8):1089-95
– reference: 20019059 - Bioinformatics. 2010 Feb 1;26(3):326-32
– reference: 22988860 - Biochem Soc Trans. 2012 Oct;40(5):1032-7
– reference: 12917441 - J Biol Chem. 2003 Oct 31;278(44):43717-27
– reference: 26088800 - Bioinformatics. 2015 Oct 15;31(20):3395-7
– reference: 20032312 - FASEB J. 2010 May;24(5):1311-9
– reference: 24681906 - Bioinformatics. 2014 Jul 15;30(14):1983-90
– reference: 15134647 - Biochim Biophys Acta. 2004 May 6;1698(2):131-53
– reference: 16537487 - Proc Natl Acad Sci U S A. 2006 Mar 14;103(11):4074-8
– reference: 28342303 - Curr Opin Cell Biol. 2017 Aug;47:34-42
– reference: 15143215 - Proc Natl Acad Sci U S A. 2004 May 25;101(21):7885-90
– reference: 10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42
– reference: 24848016 - Nucleic Acids Res. 2014 Jul;42(Web Server issue):W301-7
– reference: 16409147 - Annu Rev Med. 2006;57:223-41
– reference: 25600945 - Bioinformatics. 2015 May 15;31(10):1698-700
– reference: 15849316 - Nucleic Acids Res. 2005 Apr 22;33(7):2302-9
– reference: 2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10
– reference: 12200051 - Mol Immunol. 2002 Oct;39(3-4):203-15
– reference: 11260793 - Philos Trans R Soc Lond B Biol Sci. 2001 Feb 28;356(1406):133-45
– reference: 18034321 - Cell Mol Life Sci. 2008 Mar;65(6):910-27
– reference: 25306968 - Biochim Biophys Acta. 2015 Jan;1848(1 Pt A):1-7
– reference: 15361882 - Nat Biotechnol. 2004 Oct;22(10):1302-6
– reference: 12037327 - Acta Crystallogr D Biol Crystallogr. 2002 Jun;58(Pt 6 No 1):899-907
– reference: 16263932 - Proc Natl Acad Sci U S A. 2005 Nov 15;102(46):16672-7
– reference: 17877795 - BMC Bioinformatics. 2007 Sep 18;8:345
– reference: 20154676 - Nat Methods. 2010 Mar;7(3):237-42
– reference: 17324296 - BMC Bioinformatics. 2007 Feb 27;8:65
– reference: 16503059 - Trends Biotechnol. 2006 Apr;24(4):179-85
– reference: 27043825 - PLoS One. 2016 Apr 04;11(4):e0152949
– reference: 17720750 - Protein Eng Des Sel. 2007 Oct;20(10):521-3
– reference: 22183539 - Methods Mol Biol. 2012;819:199-220
– reference: 17135200 - Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7
– reference: 19897565 - Bioinformatics. 2010 Jan 15;26(2):182-8
SSID ssj0004123
Score 2.3010006
Snippet There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous...
SourceID unpaywall
pubmedcentral
proquest
pubmed
crossref
wiley
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1864
SubjectTerms 3D structure
Agglomeration
Aggregates
aggregation
Algorithms
Amyloid
Amyloid - analysis
Amyloid - chemistry
Amyloid - metabolism
amyloid fibrils
computational approaches
Computer applications
Cutting tools
Databases, Protein
Forming
In vivo methods and tests
Models, Statistical
NMR
Nuclear magnetic resonance
Nuclear Magnetic Resonance, Biomolecular
Predictions
Proteins
Sequences
Solubility
soluble
Title Usage of a dataset of NMR resolved protein structures to test aggregation versus solubility prediction algorithms
URI https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fpro.3225
https://www.ncbi.nlm.nih.gov/pubmed/28685932
https://www.proquest.com/docview/1930111699
https://www.proquest.com/docview/1917362238
https://pubmed.ncbi.nlm.nih.gov/PMC5563137
https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/pro.3225
UnpaywallVersion publishedVersion
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVFSB
  databaseName: Free Full-Text Journals in Chemistry
  customDbUrl:
  eissn: 1469-896X
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0004123
  issn: 0961-8368
  databaseCode: HH5
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: http://abc-chemistry.org/
  providerName: ABC ChemistRy
– providerCode: PRVEBS
  databaseName: EBSCOhost Food Science Source
  customDbUrl:
  eissn: 1469-896X
  dateEnd: 20241102
  omitProxy: false
  ssIdentifier: ssj0004123
  issn: 0961-8368
  databaseCode: A8Z
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=fsr
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1469-896X
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0004123
  issn: 0961-8368
  databaseCode: DIK
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1469-896X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004123
  issn: 0961-8368
  databaseCode: GX1
  dateStart: 0
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1469-896X
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0004123
  issn: 0961-8368
  databaseCode: RPM
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLagexgvXDYuhTEZhMZTqiZxHOexmpgGaGWaqFSeItuxu4osKWsCKr-ec5yLKAOEeIoan6R28h3nO_bxZ0JembEKLePWE3IsPaYN8xTTQOTGliko00LiOOTZlJ_O2Lt5NG8H3HAtTKMP0Q-4oWe4_hodfJXZpp9vZ_cDXK02QkTeJjs8Ai4-IDuz6fnkkxPY474nwmYtHMSAnkj4vFOf_enS7e_RDZJ5M1dyty5WcvNN5vk2n3UfpJN7JO2a0uShfB7VlRrp77-oPP5_W--Tuy1XpZMGXA_ILVPskf1JAXH61YYeUZc96obl98jucbdz3D75MsNsNVpaKilmoK5NhT-mZxcUgvsy_2oy6vQhlgVt9GtrOE-rkgLvrahcLK7NwiGGYtJIvaboIC6LdwMX4tSSK5T5orxeVpdX64dkdvLm4_Gp127t4GkWisiziTK-ihkPfMViy43OTJgAe80SnonEAomLbWYwXssCmZiIG6aiiGfa-tAnyfARGRRlYZ4QqsVYWKu1ijJgIyKEkIFnjGudWGPh3JC87l5wqlvdc9x-I08bxeYghRan-GSH5EVvuWq0Pn5jc9BhJG29fZ0CCYZu0udJArfoi-Gh4-SLLExZow1UnAMZE0PyuIFU_yeB4Cg7B1WNt8DWG6AG-HZJsbx0WuCo7-aH8ZC87GH5l7ofOZD90SA9v_iAx6f_crdn5E6AFMfl2x2QAcDFPAeCVqlDCE3evj9sffEHd0RAVA
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLagexgvXDYuhYEMQuMpVZM4jvNYTUwT0so0Uak8RY4vXbUsKWuyqfx6znEuogwQ4ilqfJLayXec79jHnwl5b8ZZaBm3npBj6TFlmJcxBURubFkGZUpIHIc8nfKTGfs0j-btgBuuhWn0IfoBN_QM11-jg6-0bfr5dnY_wNVqI0TkfbLDI-DiA7Izm55NvjqBPe57ImzWwkEM6ImEzzv12Z8u3f4e3SGZd3Mld-tiJTe3Ms-3-az7IB0_ImnXlCYP5XJUV9lIff9F5fH_2_qYPGy5Kp004HpC7plij-xPCojTrzb0kLrsUTcsv0d2j7qd4_bJtxlmq9HSUkkxA3VtKvwxPT2nENyX-Y3R1OlDLAva6NfWcJ5WJQXeW1G5WFybhUMMxaSRek3RQVwW7wYuxKklVyjzRXm9rC6u1k_J7Pjjl6MTr93awVMsFJFnk8z4Wcx44GcsttwobcIE2KtOuBaJBRIXW20wXtOBTEzEDcuiiGtlfeiTZPiMDIqyMC8IVWIsrFUqizSwERFCyMA140ol1lg4NyQfuhecqlb3HLffyNNGsTlIocUpPtkhedtbrhqtj9_YHHQYSVtvX6dAgqGb9HmSwC36YnjoOPkiC1PWaAMV50DGxJA8byDV_0kgOMrOQVXjLbD1BqgBvl1SLC-cFjjqu_lhPCTvelj-pe6HDmR_NEjPzj_j8eW_3O0VeRAgxXH5dgdkAHAxr4GgVdmb1gt_APrzP1s
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Usage+of+a+dataset+of+NMR+resolved+protein+structures+to+test+aggregation+versus+solubility+prediction+algorithms&rft.jtitle=Protein+science&rft.au=Roche%2C+Daniel+B&rft.au=Villain%2C+Etienne&rft.au=Kajava%2C+Andrey+V&rft.date=2017-09-01&rft.eissn=1469-896X&rft.volume=26&rft.issue=9&rft.spage=1864&rft_id=info:doi/10.1002%2Fpro.3225&rft_id=info%3Apmid%2F28685932&rft.externalDocID=28685932
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0961-8368&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0961-8368&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0961-8368&client=summon