Do “Newly Born” orphan proteins resemble “Never Born” proteins? A study using three deep learning algorithms

“Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if thr...

Full description

Saved in:
Bibliographic Details
Published inProteins, structure, function, and bioinformatics Vol. 91; no. 8; pp. 1097 - 1115
Main Authors Liu, Jing, Yuan, Rongqing, Shao, Wei, Wang, Jitong, Silman, Israel, Sussman, Joel L.
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.08.2023
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN0887-3585
1097-0134
1097-0134
DOI10.1002/prot.26496

Cover

Abstract “Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such “Newly Born” proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called “Never Born” proteins. The programs were used to compare the structures of two sets of “Never Born” proteins that had been expressed—Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high‐quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well‐identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high‐quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3
AbstractList "Newly Born" proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such "Newly Born" proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called "Never Born" proteins. The programs were used to compare the structures of two sets of "Never Born" proteins that had been expressed-Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high-quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well-identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high-quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3.
“Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such “Newly Born” proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called “Never Born” proteins. The programs were used to compare the structures of two sets of “Never Born” proteins that had been expressed—Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high‐quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well‐identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high‐quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3
“Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such “Newly Born” proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called “Never Born” proteins. The programs were used to compare the structures of two sets of “Never Born” proteins that had been expressed—Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high‐quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well‐identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high‐quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3
"Newly Born" proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such "Newly Born" proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called "Never Born" proteins. The programs were used to compare the structures of two sets of "Never Born" proteins that had been expressed-Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high-quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well-identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high-quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3."Newly Born" proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such "Newly Born" proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called "Never Born" proteins. The programs were used to compare the structures of two sets of "Never Born" proteins that had been expressed-Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high-quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well-identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high-quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3.
Author Sussman, Joel L.
Shao, Wei
Silman, Israel
Liu, Jing
Yuan, Rongqing
Wang, Jitong
Author_xml – sequence: 1
  givenname: Jing
  surname: Liu
  fullname: Liu, Jing
  organization: Faculty of Biotechnology and Food Engineering, Technion‐Israel Institute of Technology
– sequence: 2
  givenname: Rongqing
  surname: Yuan
  fullname: Yuan, Rongqing
  organization: Tsinghua University
– sequence: 3
  givenname: Wei
  surname: Shao
  fullname: Shao, Wei
  organization: Shanghai Jiao Tong University
– sequence: 4
  givenname: Jitong
  surname: Wang
  fullname: Wang, Jitong
  organization: Tsinghua University
– sequence: 5
  givenname: Israel
  orcidid: 0000-0003-1923-0829
  surname: Silman
  fullname: Silman, Israel
  email: israel.silman@weizmann.ac.il
  organization: The Weizmann Institute of Science
– sequence: 6
  givenname: Joel L.
  orcidid: 0000-0003-0306-3878
  surname: Sussman
  fullname: Sussman, Joel L.
  email: joel.sussman@weizmann.ac.il
  organization: The Weizmann Institute of Science
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37092778$$D View this record in MEDLINE/PubMed
BookMark eNp90FFP1TAUB_DGQOSCvvgBTBNfDGbYrlvXPRkERRMihOBz07udcUu6drSdN3vjg-iX45PY64AHYnhq0_zO6fmfXbRlnQWE3lByQAnJPw7exYOcFzV_gRaU1FVGKCu20IIIUWWsFOUO2g3hmhDCa8Zfoh1WkTqvKrFA8djhu9vfP2BtJvzZeXt3-wc7P6yUxZu-oG3AHgL0SwOz_AX-UT6QT_gQhzi2Ex6Dtlc4rjwAbgEGbEB5u3lT5sp5HVd9eIW2O2UCvL4_99DPr18uj75lp2cn348OT7OGpTkzlfOqaFWXk7rlbbfkrGS5okCqOqXsuhK4okoJ1nVLIjhlZVPRdGs5bQkROdtDH-a-ox3UtFbGyMHrXvlJUiI3u5ObAPLf7pJ-P-v0djNCiLLXoQFjlAU3BpkLUpa0KGid6Lsn9NqN3qYsSbGkqlKIpN7eq3HZQ_v498P2EyAzaLwLwUMnGx1V1M5Gr7T5_5D7T0qeTURnvNYGpmekPL84u5xr_gJQFrl0
CitedBy_id crossref_primary_10_1007_s00239_024_10174_z
crossref_primary_10_1093_gbe_evae107
crossref_primary_10_1093_gbe_evae069
crossref_primary_10_3390_plants13243601
crossref_primary_10_3389_fpls_2025_1532449
crossref_primary_10_1093_gbe_evae175
crossref_primary_10_1002_prot_26652
crossref_primary_10_1093_gbe_evae176
crossref_primary_10_1038_s41559_023_02252_0
crossref_primary_10_1016_j_ijpara_2024_08_003
Cites_doi 10.1021/bi047993o
10.1093/nar/26.1.316
10.7554/eLife.53500
10.1038/s41467-021-21511-x
10.1093/nar/gkac387
10.1016/j.biochi.2011.07.014
10.3389/fphy.2019.00010
10.1002/prot.10471
10.1093/bioinformatics/btt473
10.3389/fphar.2022.1014804
10.1038/s41598-018-25867-x
10.1093/gbe/evaa194
10.1038/nrg3053
10.1002/prot.10559
10.1038/nsb0696-488
10.1126/science.abj8754
10.7554/eLife.03523
10.1002/prot.20148
10.1002/prot.10011
10.1371/journal.pgen.1002942
10.1038/s41586-021-04184-w
10.1038/s41467-021-24773-7
10.1016/j.celrep.2022.111808
10.1107/S2052252520000986
10.1101/gr.095026.109
10.1016/j.sbi.2020.11.010
10.1038/s41592-021-01117-3
10.1002/cbdv.200690088
10.1371/journal.pone.0056162
10.1371/journal.pone.0036634
10.1110/ps.04690804
10.1093/bioinformatics/bti537
10.1002/cbdv.200690087
10.1371/journal.pone.0031673
10.1038/s41587-022-01432-w
10.1016/j.sbi.2014.05.006
10.1371/journal.pgen.1003996
10.1038/s41592-022-01488-1
10.1038/s41592-021-01362-6
10.7554/eLife.44392
10.1371/journal.pgen.1003860
10.1098/rstb.2014.0332
10.1021/bi400502c
10.1038/nature11184
10.1093/bioinformatics/btac474
10.1093/bioadv/vbab043
10.1016/S0959-440X(02)00337-8
10.1093/nar/28.1.235
10.1038/nbt.2419
10.1371/journal.pgen.1002379
10.1371/journal.pcbi.1000734
10.1101/2022.02.18.481080
10.1016/j.jmb.2021.167208
10.1021/bi00163a039
10.1093/bioinformatics/btp660
10.1038/s41586-021-03819-2
10.1016/S0065-3233(00)53005-8
10.1093/molbev/msn281
10.1038/nsb1095-856
10.1002/pro.3749
10.1101/gr.098376.109
10.1107/S0907444998009378
10.1093/nar/gkab1238
10.1126/science.860134
10.1002/prot.26237
10.1038/s41598-017-15635-8
10.1002/prot.20138
10.1093/database/bas003
10.1021/acs.jpcb.2c05508
10.1038/35070613
10.1016/j.sbi.2008.10.002
10.12688/f1000research.10079.1
10.1002/prot.10018
10.1016/j.tig.2009.07.006
10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7
ContentType Journal Article
Copyright 2023 The Authors. published by Wiley Periodicals LLC.
2023 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.
2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2023 The Authors. published by Wiley Periodicals LLC.
– notice: 2023 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.
– notice: 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 24P
AAYXX
CITATION
NPM
7QL
7QO
7QP
7QR
7TK
7TM
7U9
8FD
C1K
FR3
H94
K9.
M7N
P64
RC3
7X8
ADTOC
UNPAY
DOI 10.1002/prot.26496
DatabaseName Wiley Online Library Open Access
CrossRef
PubMed
Bacteriology Abstracts (Microbiology B)
Biotechnology Research Abstracts
Calcium & Calcified Tissue Abstracts
Chemoreception Abstracts
Neurosciences Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Technology Research Database
Environmental Sciences and Pollution Management
Engineering Research Database
AIDS and Cancer Research Abstracts
ProQuest Health & Medical Complete (Alumni)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
PubMed
Virology and AIDS Abstracts
Technology Research Database
Nucleic Acids Abstracts
ProQuest Health & Medical Complete (Alumni)
Neurosciences Abstracts
Biotechnology and BioEngineering Abstracts
Environmental Sciences and Pollution Management
Genetics Abstracts
Biotechnology Research Abstracts
Bacteriology Abstracts (Microbiology B)
Algology Mycology and Protozoology Abstracts (Microbiology C)
AIDS and Cancer Research Abstracts
Chemoreception Abstracts
Engineering Research Database
Calcium & Calcified Tissue Abstracts
MEDLINE - Academic
DatabaseTitleList PubMed
Virology and AIDS Abstracts
CrossRef

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: 24P
  name: Wiley Online Library Open Access
  url: https://authorservices.wiley.com/open-science/open-access/browse-journals.html
  sourceTypes: Publisher
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
Biology
EISSN 1097-0134
EndPage 1115
ExternalDocumentID 10.1002/prot.26496
37092778
10_1002_prot_26496
PROT26496
Genre article
Journal Article
GrantInformation_xml – fundername: Center for Scientific Excellence at the Weizmann Institute of Science
GroupedDBID -~X
.3N
.GA
.GJ
.Y3
05W
0R~
10A
1L6
1OB
1OC
1ZS
24P
31~
33P
3SF
3WU
4.4
4ZD
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
53G
5RE
5VS
66C
6TJ
702
7PT
8-0
8-1
8-3
8-4
8-5
8UM
930
A03
AAESR
AAEVG
AAHHS
AAHQN
AAMNL
AANHP
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAZKR
ABCQN
ABCUV
ABEML
ABIJN
ABLJU
ACAHQ
ACBWZ
ACCFJ
ACCZN
ACFBH
ACGFS
ACIWK
ACPOU
ACPRK
ACRPL
ACSCC
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADNMO
ADOZA
ADZMN
AEEZP
AEIGN
AEIMD
AEQDE
AEUQT
AEUYR
AFBPY
AFFPM
AFGKR
AFPWT
AFRAH
AFWVQ
AFZJQ
AHBTC
AHMBA
AITYG
AIURR
AIWBW
AJBDE
AJXKR
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMYDB
ASPBG
ATUGU
AUFTA
AVWKF
AZBYB
AZFZN
AZVAB
BAFTC
BDRZF
BFHJK
BHBCM
BLYAC
BMNLL
BNHUX
BROTX
BRXPI
BY8
CS3
D-E
D-F
D0L
DCZOG
DPXWK
DR1
DR2
DRFUL
DRSTM
EBD
EBS
EJD
EMOBN
F00
F01
F04
F5P
FA8
FEDTE
G-S
G.N
GNP
GODZA
H.T
H.X
HBH
HF~
HGLYW
HHY
HHZ
HVGLF
HZ~
IX1
JPC
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LH6
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
NDZJH
NF~
NNB
O66
O9-
P2P
P2W
P2X
P4D
PALCI
PQQKQ
Q.N
Q11
QB0
QRW
R.K
RBB
RIWAO
RJQFR
RNS
ROL
RWI
RX1
SAMSI
SUPJJ
SV3
UB1
V2E
W8V
W99
WBFHL
WBKPD
WIB
WIH
WIK
WJL
WOHZO
WQJ
WRC
WSB
WXSBR
WYISQ
XG1
XPP
XV2
ZGI
ZXP
ZZTAW
~IA
~WT
AAYXX
AEYWJ
AGHNM
AGQPQ
AGYGG
AIQQE
CITATION
NPM
7QL
7QO
7QP
7QR
7TK
7TM
7U9
8FD
C1K
FR3
H94
K9.
M7N
P64
RC3
7X8
ADTOC
UNPAY
ID FETCH-LOGICAL-c3936-a2674daf209d6dfb63532a1e079109ff5e6a1aa83ffb086135c71b08d61d00823
IEDL.DBID 24P
ISSN 0887-3585
1097-0134
IngestDate Wed Oct 01 15:58:01 EDT 2025
Wed Oct 01 14:28:23 EDT 2025
Tue Oct 07 06:05:35 EDT 2025
Mon Jul 21 06:00:40 EDT 2025
Thu Apr 24 23:02:07 EDT 2025
Wed Oct 01 01:19:21 EDT 2025
Wed Jan 22 16:19:15 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Keywords orphan protein
molten globule
taxonomically restricted
intrinsically disordered protein
protein structure prediction
Language English
License Attribution
2023 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3936-a2674daf209d6dfb63532a1e079109ff5e6a1aa83ffb086135c71b08d61d00823
Notes Jing Liu and Rongqing Yuan contributed equally to this work.
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-0306-3878
0000-0003-1923-0829
OpenAccessLink https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fprot.26496
PMID 37092778
PQID 2835517588
PQPubID 1016441
PageCount 19
ParticipantIDs unpaywall_primary_10_1002_prot_26496
proquest_miscellaneous_2805514419
proquest_journals_2835517588
pubmed_primary_37092778
crossref_citationtrail_10_1002_prot_26496
crossref_primary_10_1002_prot_26496
wiley_primary_10_1002_prot_26496_PROT26496
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate August 2023
2023-08-00
2023-Aug
20230801
PublicationDateYYYYMMDD 2023-08-01
PublicationDate_xml – month: 08
  year: 2023
  text: August 2023
PublicationDecade 2020
PublicationPlace Hoboken, USA
PublicationPlace_xml – name: Hoboken, USA
– name: United States
– name: Hokoben
PublicationTitle Proteins, structure, function, and bioinformatics
PublicationTitleAlternate Proteins
PublicationYear 2023
Publisher John Wiley & Sons, Inc
Wiley Subscription Services, Inc
Publisher_xml – name: John Wiley & Sons, Inc
– name: Wiley Subscription Services, Inc
References 2017; 6
2013; 29
2021; 68
2017; 7
2012; 2012
2012; 487
2002; 12
2021; 600
2000; 41
2014; 26
2005; 21
2011; 12
2020; 12
2013; 8
2003; 53
2013; 9
2020; 7
2001; 410
2015; 370
2018; 8
2010; 26
2014; 3
2021; 596
2021; 433
2022; 40
2002; 46
2000; 11
2020; 9
2013; 52
2000; 53
2009; 19
1998; 54
1996; 3
2022; 38
2022; 126
2010; 6
2014; 10
2019; 8
1998; 26
2019; 7
2009; 25
2021; 89
2000; 28
2022; 50
2015; 169
2008; 18
2006; 3
2022; 41
1995; 2
1992; 31
2005; 44
2009; 26
2011; 7
2012; 30
1999
2004; 54
2021; 12
2022
2011; 93
2021; 18
2004; 56
2004; 13
2022; 13
2021; 373
2022; 2
2012; 7
1977; 196
2012; 8
2022; 19
2020; 29
e_1_2_9_75_1
e_1_2_9_31_1
e_1_2_9_52_1
e_1_2_9_73_1
e_1_2_9_79_1
e_1_2_9_10_1
e_1_2_9_35_1
e_1_2_9_56_1
e_1_2_9_77_1
e_1_2_9_12_1
e_1_2_9_33_1
e_1_2_9_54_1
e_1_2_9_71_1
Perochon A (e_1_2_9_50_1) 2015; 169
e_1_2_9_14_1
e_1_2_9_39_1
e_1_2_9_16_1
e_1_2_9_37_1
e_1_2_9_58_1
e_1_2_9_18_1
e_1_2_9_41_1
e_1_2_9_64_1
e_1_2_9_20_1
e_1_2_9_62_1
e_1_2_9_22_1
e_1_2_9_45_1
e_1_2_9_68_1
e_1_2_9_24_1
e_1_2_9_43_1
e_1_2_9_66_1
e_1_2_9_8_1
e_1_2_9_6_1
e_1_2_9_4_1
e_1_2_9_26_1
e_1_2_9_49_1
e_1_2_9_28_1
e_1_2_9_47_1
Bränden C (e_1_2_9_2_1) 1999
e_1_2_9_30_1
e_1_2_9_53_1
e_1_2_9_74_1
e_1_2_9_51_1
e_1_2_9_72_1
e_1_2_9_11_1
e_1_2_9_34_1
e_1_2_9_57_1
e_1_2_9_78_1
e_1_2_9_13_1
e_1_2_9_32_1
e_1_2_9_55_1
e_1_2_9_76_1
e_1_2_9_70_1
e_1_2_9_15_1
e_1_2_9_38_1
e_1_2_9_17_1
e_1_2_9_36_1
e_1_2_9_59_1
e_1_2_9_19_1
e_1_2_9_42_1
e_1_2_9_63_1
e_1_2_9_40_1
e_1_2_9_61_1
e_1_2_9_21_1
e_1_2_9_46_1
e_1_2_9_67_1
e_1_2_9_23_1
e_1_2_9_44_1
e_1_2_9_65_1
e_1_2_9_7_1
e_1_2_9_80_1
e_1_2_9_5_1
e_1_2_9_3_1
e_1_2_9_9_1
e_1_2_9_25_1
e_1_2_9_27_1
e_1_2_9_48_1
Dunker AK (e_1_2_9_60_1) 2000; 11
e_1_2_9_69_1
e_1_2_9_29_1
References_xml – volume: 19
  start-page: 11
  issue: 1
  year: 2022
  end-page: 12
  article-title: Protein structure predictions to atomic accuracy with AlphaFold
  publication-title: Nat Methods
– volume: 68
  start-page: 175
  year: 2021
  end-page: 183
  article-title: Structure and function of naturally evolved de novo proteins
  publication-title: Curr Opin Struct Biol
– volume: 25
  start-page: 404
  issue: 9
  year: 2009
  end-page: 413
  article-title: More than just orphans: are taxonomically‐restricted genes important in evolution?
  publication-title: Trends Genet
– volume: 93
  start-page: 1928
  issue: 11
  year: 2011
  end-page: 1934
  article-title: Exonization of transposed elements: a challenge and opportunity for evolution
  publication-title: Biochimie
– volume: 126
  start-page: 8439
  issue: 42
  year: 2022
  end-page: 8446
  article-title: Prediction of intrinsic disorder using Rosetta ResidueDisorder and AlphaFold2
  publication-title: J Chem Phys B
– volume: 8
  issue: 1
  year: 2018
  article-title: Novel erythrocyte clumps revealed by an orphan gene Newtic1 in circulating blood and regenerating limbs of the adult newt
  publication-title: Sci Rep
– volume: 9
  issue: 10
  year: 2013
  article-title: De novo ORFs in are important to organismal fitness and evolved rapidly from previously non‐coding sequences
  publication-title: PLoS Genet
– volume: 89
  start-page: 1607
  issue: 12
  year: 2021
  end-page: 1617
  article-title: Critical assessment of methods of protein structure prediction (CASP)‐round XIV
  publication-title: Proteins
– volume: 12
  start-page: 2183
  issue: 11
  year: 2020
  end-page: 2195
  article-title: Stochastic gain and loss of novel transcribed open reading frames in the human lineage
  publication-title: Genome Biol Evol
– volume: 11
  start-page: 161
  year: 2000
  end-page: 171
  article-title: Intrinsic protein disorder in complete genomes
  publication-title: Genome Inform
– volume: 7
  issue: 11
  year: 2011
  article-title: De novo origin of human protein‐coding genes
  publication-title: PLoS Genet
– volume: 3
  start-page: 827
  issue: 8
  year: 2006
  end-page: 839
  article-title: Investigation of de novo totally random biosequences, part I: a general method for in vitro selection of folded domains from a random polypeptide library displayed on phage
  publication-title: Chem Biodivers
– volume: 6
  year: 2017
  article-title: Fact or fiction: updates on how protein‐coding genes might emerge de novo from previously non‐coding DNA
  publication-title: F1000Res
– volume: 433
  issue: 20
  year: 2021
  article-title: AlphaFold and implications for intrinsically disordered proteins
  publication-title: J Mol Biol
– volume: 2
  start-page: 856
  issue: 10
  year: 1995
  end-page: 864
  article-title: Cooperatively folded proteins in random sequence libraries
  publication-title: Nat Struct Biol
– volume: 8
  issue: 9
  year: 2012
  article-title: Hominoid‐specific de novo protein‐coding genes originating from long non‐coding RNAs
  publication-title: PLoS Genet
– volume: 28
  start-page: 235
  issue: 1
  year: 2000
  end-page: 242
  article-title: The Protein Data Bank
  publication-title: Nucleic Acids Res
– volume: 26
  start-page: 310
  issue: 3
  year: 2010
  end-page: 318
  article-title: Globally, unrelated protein sequences appear random
  publication-title: Bioinformatics
– volume: 7
  issue: 5
  year: 2012
  article-title: Do natural proteins differ from random sequences polypeptides? Natural vs. random proteins classification using an evolutionary neural network
  publication-title: PLoS One
– volume: 56
  start-page: 607
  issue: 3
  year: 2004
  end-page: 610
  article-title: Crystal structure of an orphan protein (TM0875) from at 2.00‐Å resolution reveals a new fold
  publication-title: Proteins
– volume: 7
  year: 2019
  article-title: Intrinsically disordered proteins and their “mysterious” (meta)physics
  publication-title: Front Phys
– volume: 46
  start-page: 61
  issue: 1
  year: 2002
  end-page: 71
  article-title: A unifold, mesofold, and superfold model of protein fold use
  publication-title: Proteins
– volume: 26
  start-page: 73
  year: 2014
  end-page: 83
  article-title: Orphans and new gene origination, a structural and evolutionary perspective
  publication-title: Curr Opin Struct Biol
– volume: 56
  start-page: 564
  issue: 3
  year: 2004
  end-page: 571
  article-title: Novel structure and nucleotide binding properties of HI1480 from : a protein with no known sequence homologues
  publication-title: Proteins
– volume: 12
  issue: 1
  year: 2021
  article-title: Improved protein structure refinement guided by deep learning based accuracy estimation
  publication-title: Nat Commun
– volume: 18
  start-page: 756
  issue: 6
  year: 2008
  end-page: 764
  article-title: Function and structure of inherently disordered proteins
  publication-title: Curr Opin Struct Biol
– volume: 2
  issue: 1
  year: 2022
  article-title: Folding the unfoldable: using AlphaFold to explore spurious proteins
  publication-title: Bioinform Adv
– year: 2022
– volume: 29
  start-page: 128
  issue: 1
  year: 2020
  end-page: 140
  article-title: DALI and the persistence of protein shape
  publication-title: Protein Sci
– volume: 7
  issue: 1
  year: 2017
  article-title: Random protein sequences can form defined secondary structures and are well‐tolerated in vivo
  publication-title: Sci Rep
– volume: 196
  start-page: 1161
  issue: 4295
  year: 1977
  end-page: 1166
  article-title: Evolution and tinkering
  publication-title: Science
– volume: 373
  start-page: 871
  issue: 6557
  year: 2021
  end-page: 876
  article-title: Accurate prediction of protein structures and interactions using a three‐track neural network
  publication-title: Science
– volume: 54
  start-page: 20
  issue: 1
  year: 2004
  end-page: 40
  article-title: Proteomic signatures: amino acid and oligopeptide compositions differentiate among phyla
  publication-title: Proteins
– volume: 13
  start-page: 1711
  issue: 7
  year: 2004
  end-page: 1723
  article-title: De novo proteins from designed combinatorial libraries
  publication-title: Protein Sci
– volume: 9
  start-page: e53500
  year: 2020
  article-title: Synteny‐based analyses indicate that sequence divergence is not the main source of orphan genes
  publication-title: Elife
– volume: 410
  start-page: 715
  issue: 6829
  year: 2001
  end-page: 718
  article-title: Functional proteins from a random‐sequence library
  publication-title: Nature
– volume: 7
  issue: 2
  year: 2012
  article-title: Structural view of a non Pfam singleton and crystal packing analysis
  publication-title: PLoS One
– volume: 10
  issue: 1
  year: 2014
  article-title: NCYM, a cis‐antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3beta resulting in the stabilization of MYCN in human neuroblastomas
  publication-title: PLoS Genet
– volume: 487
  start-page: 370
  issue: 7407
  year: 2012
  end-page: 374
  article-title: Proto‐genes and de novo gene birth
  publication-title: Nature
– volume: 19
  start-page: 1752
  issue: 10
  year: 2009
  end-page: 1759
  article-title: Recent de novo origin of human protein‐coding genes
  publication-title: Genome Res
– volume: 41
  start-page: 415
  issue: 3
  year: 2000
  end-page: 427
  article-title: Why are “natively unfolded” proteins unstructured under physiologic conditions?
  publication-title: Proteins
– volume: 19
  start-page: 679
  issue: 6
  year: 2022
  end-page: 682
  article-title: ColabFold: making protein folding accessible to all
  publication-title: Nat Methods
– volume: 44
  start-page: 1989
  issue: 6
  year: 2005
  end-page: 2000
  article-title: Comparing and combining predictors of mostly disordered proteins
  publication-title: Biochemistry
– volume: 26
  start-page: 603
  issue: 3
  year: 2009
  end-page: 612
  article-title: Origin of primate orphan genes: a comparative genomics approach
  publication-title: Mol Biol Evol
– volume: 600
  start-page: 547
  issue: 7889
  year: 2021
  end-page: 552
  article-title: De novo protein design by deep network hallucination
  publication-title: Nature
– volume: 52
  start-page: 5167
  issue: 31
  year: 2013
  end-page: 5175
  article-title: Cooperative unfolding of compact conformations of the intrinsically disordered protein osteopontin
  publication-title: Biochemistry
– volume: 40
  start-page: 1617
  issue: 11
  year: 2022
  end-page: 1623
  article-title: Single‐sequence protein structure prediction using a language model and deep learning
  publication-title: Nat Biotechnol
– volume: 29
  start-page: 2722
  issue: 21
  year: 2013
  end-page: 2728
  article-title: lDDT: a local superposition‐free score for comparing protein structures and models using distance difference tests
  publication-title: Bioinformatics
– volume: 26
  start-page: 316
  issue: 1
  year: 1998
  end-page: 319
  article-title: Touring protein fold space with Dali/FSSP
  publication-title: Nucleic Acids Res
– volume: 12
  start-page: 692
  issue: 10
  year: 2011
  end-page: 702
  article-title: The evolutionary origin of orphan genes
  publication-title: Nat Rev Genet
– volume: 8
  year: 2019
  article-title: A de novo evolved gene in the house mouse regulates female pregnancy cycles
  publication-title: Elife
– volume: 53
  start-page: 758
  issue: 3
  year: 2003
  end-page: 767
  article-title: The intracellular domain of the cholinesterase‐like neural adhesion protein, gliotactin, is natively unfolded
  publication-title: Proteins
– volume: 21
  start-page: 3435
  issue: 16
  year: 2005
  end-page: 3438
  article-title: FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded
  publication-title: Bioinformatics
– volume: 370
  issue: 1678
  year: 2015
  article-title: New genes from non‐coding sequence: the role of de novo protein‐coding genes in eukaryotic evolutionary innovation
  publication-title: Philos Trans R Soc Lond B Biol Sci
– volume: 50
  start-page: W210
  issue: W1
  year: 2022
  end-page: W215
  article-title: Dali server: structural unification of protein families
  publication-title: Nucleic Acids Res
– volume: 3
  year: 2014
  article-title: Long non‐coding RNAs as a source of new peptides
  publication-title: Elife
– volume: 30
  start-page: 1072
  issue: 11
  year: 2012
  end-page: 1080
  article-title: Protein structure prediction from sequence variation
  publication-title: Nat Biotechnol
– volume: 19
  start-page: 1693
  issue: 10
  year: 2009
  end-page: 1695
  article-title: Darwinian alchemy: human genes from noncoding DNA
  publication-title: Genome Res
– volume: 7
  start-page: 287
  issue: Pt 2
  year: 2020
  end-page: 293
  article-title: Structure and mechanism of copper‐carbonic anhydrase II: a nitrite reductase
  publication-title: IUCrJ
– volume: 13
  year: 2022
  article-title: Thermal proteome profiling reveals orphan protein HCO_011565 as a target of the nematocidal small molecule UMW‐868
  publication-title: Front Pharmacol
– volume: 53
  start-page: 209
  year: 2000
  end-page: 282
  article-title: Role of the molten globule state in protein folding
  publication-title: Adv Protein Chem
– volume: 41
  issue: 12
  year: 2022
  article-title: De novo birth of functional microproteins in the human lineage
  publication-title: Cell Rep
– volume: 3
  start-page: 840
  issue: 8
  year: 2006
  end-page: 859
  article-title: Investigation of de novo totally random biosequences, part II: on the folding frequency in a totally random library of de novo proteins obtained by phage display
  publication-title: Chem Biodivers
– volume: 8
  issue: 2
  year: 2013
  article-title: PBOV1 is a human de novo gene with tumor‐specific expression that is associated with a positive clinical outcome of cancer
  publication-title: PLoS One
– volume: 31
  start-page: 12248
  issue: 48
  year: 1992
  end-page: 12254
  article-title: Chemical modification of acetylcholinesterase by disulfides: appearance of a "molten globule" state
  publication-title: Biochemistry
– volume: 169
  start-page: 2895
  issue: 4
  year: 2015
  end-page: 2906
  article-title: encodes a orphan protein that interacts with SnRK1 and enhances resistance to the Mycotoxigenic fungus
  publication-title: Plant Physiol
– volume: 2012
  year: 2012
  article-title: AntiFam: a tool to help identify spurious ORFs in protein annotation
  publication-title: Database
– volume: 54
  start-page: 1078
  issue: Pt 6 Pt 1
  year: 1998
  end-page: 1084
  article-title: Protein data bank (PDB): a database of 3D structural information of biological macromolecules
  publication-title: Acta Crystallogr D Biol Crystallogr
– volume: 50
  issue: 7
  year: 2022
  article-title: Foster thy young: enhanced prediction of orphan genes in assembled genomes
  publication-title: Nucleic Acids Res
– volume: 6
  issue: 3
  year: 2010
  article-title: A human‐specific de novo protein‐coding gene associated with human brain functions
  publication-title: PLoS Comput Biol
– volume: 3
  start-page: 488
  issue: 6
  year: 1996
  end-page: 490
  article-title: How molten is the molten globule?
  publication-title: Nat Struct Biol
– volume: 38
  start-page: ii95
  issue: Supplement_2
  year: 2022
  end-page: ii98
  article-title: DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts
  publication-title: Bioinformatics
– volume: 18
  start-page: 472
  issue: 5
  year: 2021
  end-page: 481
  article-title: Critical assessment of protein intrinsic disorder prediction
  publication-title: Nat Methods
– volume: 12
  start-page: 409
  issue: 3
  year: 2002
  end-page: 416
  article-title: Did evolution leap to create the protein universe?
  publication-title: Curr Opin Struct Biol
– volume: 46
  start-page: 1
  issue: 1
  year: 2002
  end-page: 7
  article-title: Intrinsic structural disorder and sequence features of the cell cycle inhibitor p57Kip2
  publication-title: Proteins
– volume: 596
  start-page: 583
  issue: 7873
  year: 2021
  end-page: 589
  article-title: Highly accurate protein structure prediction with AlphaFold
  publication-title: Nature
– year: 1999
– volume: 12
  issue: 1
  year: 2021
  article-title: flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions
  publication-title: Nat Commun
– volume-title: Introduction to Protein Structure
  year: 1999
  ident: e_1_2_9_2_1
– ident: e_1_2_9_61_1
  doi: 10.1021/bi047993o
– ident: e_1_2_9_55_1
  doi: 10.1093/nar/26.1.316
– ident: e_1_2_9_20_1
  doi: 10.7554/eLife.53500
– ident: e_1_2_9_54_1
  doi: 10.1038/s41467-021-21511-x
– ident: e_1_2_9_66_1
  doi: 10.1093/nar/gkac387
– ident: e_1_2_9_23_1
  doi: 10.1016/j.biochi.2011.07.014
– ident: e_1_2_9_59_1
  doi: 10.3389/fphy.2019.00010
– ident: e_1_2_9_39_1
  doi: 10.1002/prot.10471
– ident: e_1_2_9_53_1
  doi: 10.1093/bioinformatics/btt473
– ident: e_1_2_9_45_1
  doi: 10.3389/fphar.2022.1014804
– ident: e_1_2_9_51_1
  doi: 10.1038/s41598-018-25867-x
– volume: 11
  start-page: 161
  year: 2000
  ident: e_1_2_9_60_1
  article-title: Intrinsic protein disorder in complete genomes
  publication-title: Genome Inform
– ident: e_1_2_9_30_1
  doi: 10.1093/gbe/evaa194
– ident: e_1_2_9_15_1
  doi: 10.1038/nrg3053
– ident: e_1_2_9_3_1
  doi: 10.1002/prot.10559
– ident: e_1_2_9_75_1
  doi: 10.1038/nsb0696-488
– ident: e_1_2_9_13_1
  doi: 10.1126/science.abj8754
– ident: e_1_2_9_31_1
  doi: 10.7554/eLife.03523
– ident: e_1_2_9_43_1
  doi: 10.1002/prot.20148
– ident: e_1_2_9_71_1
  doi: 10.1002/prot.10011
– ident: e_1_2_9_29_1
  doi: 10.1371/journal.pgen.1002942
– ident: e_1_2_9_14_1
  doi: 10.1038/s41586-021-04184-w
– ident: e_1_2_9_58_1
  doi: 10.1038/s41467-021-24773-7
– ident: e_1_2_9_65_1
  doi: 10.1016/j.celrep.2022.111808
– ident: e_1_2_9_38_1
  doi: 10.1107/S2052252520000986
– ident: e_1_2_9_25_1
  doi: 10.1101/gr.095026.109
– ident: e_1_2_9_33_1
– ident: e_1_2_9_73_1
  doi: 10.1016/j.sbi.2020.11.010
– ident: e_1_2_9_67_1
  doi: 10.1038/s41592-021-01117-3
– ident: e_1_2_9_8_1
  doi: 10.1002/cbdv.200690088
– ident: e_1_2_9_46_1
  doi: 10.1371/journal.pone.0056162
– ident: e_1_2_9_5_1
  doi: 10.1371/journal.pone.0036634
– ident: e_1_2_9_12_1
  doi: 10.1110/ps.04690804
– ident: e_1_2_9_57_1
  doi: 10.1093/bioinformatics/bti537
– ident: e_1_2_9_7_1
  doi: 10.1002/cbdv.200690087
– ident: e_1_2_9_44_1
  doi: 10.1371/journal.pone.0031673
– ident: e_1_2_9_80_1
  doi: 10.1038/s41587-022-01432-w
– ident: e_1_2_9_72_1
  doi: 10.1016/j.sbi.2014.05.006
– ident: e_1_2_9_48_1
  doi: 10.1371/journal.pgen.1003996
– ident: e_1_2_9_52_1
  doi: 10.1038/s41592-022-01488-1
– ident: e_1_2_9_78_1
  doi: 10.1038/s41592-021-01362-6
– ident: e_1_2_9_49_1
  doi: 10.7554/eLife.44392
– ident: e_1_2_9_17_1
  doi: 10.1371/journal.pgen.1003860
– ident: e_1_2_9_18_1
  doi: 10.1098/rstb.2014.0332
– ident: e_1_2_9_41_1
  doi: 10.1021/bi400502c
– ident: e_1_2_9_16_1
  doi: 10.1038/nature11184
– ident: e_1_2_9_6_1
  doi: 10.1093/bioinformatics/btac474
– ident: e_1_2_9_68_1
  doi: 10.1093/bioadv/vbab043
– ident: e_1_2_9_70_1
  doi: 10.1016/S0959-440X(02)00337-8
– ident: e_1_2_9_35_1
  doi: 10.1093/nar/28.1.235
– volume: 169
  start-page: 2895
  issue: 4
  year: 2015
  ident: e_1_2_9_50_1
  article-title: TaFROG encodes a Pooideae orphan protein that interacts with SnRK1 and enhances resistance to the Mycotoxigenic fungus Fusarium graminearum
  publication-title: Plant Physiol
– ident: e_1_2_9_79_1
  doi: 10.1038/nbt.2419
– ident: e_1_2_9_28_1
  doi: 10.1371/journal.pgen.1002379
– ident: e_1_2_9_47_1
  doi: 10.1371/journal.pcbi.1000734
– ident: e_1_2_9_64_1
  doi: 10.1101/2022.02.18.481080
– ident: e_1_2_9_63_1
  doi: 10.1016/j.jmb.2021.167208
– ident: e_1_2_9_77_1
  doi: 10.1021/bi00163a039
– ident: e_1_2_9_4_1
  doi: 10.1093/bioinformatics/btp660
– ident: e_1_2_9_32_1
  doi: 10.1038/s41586-021-03819-2
– ident: e_1_2_9_76_1
  doi: 10.1016/S0065-3233(00)53005-8
– ident: e_1_2_9_27_1
  doi: 10.1093/molbev/msn281
– ident: e_1_2_9_10_1
  doi: 10.1038/nsb1095-856
– ident: e_1_2_9_56_1
  doi: 10.1002/pro.3749
– ident: e_1_2_9_26_1
  doi: 10.1101/gr.098376.109
– ident: e_1_2_9_34_1
  doi: 10.1107/S0907444998009378
– ident: e_1_2_9_21_1
  doi: 10.1093/nar/gkab1238
– ident: e_1_2_9_22_1
  doi: 10.1126/science.860134
– ident: e_1_2_9_36_1
  doi: 10.1002/prot.26237
– ident: e_1_2_9_9_1
  doi: 10.1038/s41598-017-15635-8
– ident: e_1_2_9_42_1
  doi: 10.1002/prot.20138
– ident: e_1_2_9_69_1
  doi: 10.1093/database/bas003
– ident: e_1_2_9_62_1
  doi: 10.1021/acs.jpcb.2c05508
– ident: e_1_2_9_11_1
  doi: 10.1038/35070613
– ident: e_1_2_9_37_1
  doi: 10.1016/j.sbi.2008.10.002
– ident: e_1_2_9_19_1
  doi: 10.12688/f1000research.10079.1
– ident: e_1_2_9_40_1
  doi: 10.1002/prot.10018
– ident: e_1_2_9_24_1
  doi: 10.1016/j.tig.2009.07.006
– ident: e_1_2_9_74_1
  doi: 10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7
SSID ssj0006936
Score 2.4841766
Snippet “Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically...
"Newly Born" proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically...
SourceID unpaywall
proquest
pubmed
crossref
wiley
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1097
SubjectTerms Algorithms
Amino acids
Crystal structure
Deep learning
Experimental data
Homology
intrinsically disordered protein
Machine learning
molten globule
Open reading frames
orphan protein
Polypeptides
Protein folding
Protein structure
protein structure prediction
Proteins
Secondary structure
Statistical analysis
taxonomically restricted
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1ba9RAFD7oFik-eGm9rFQZsQgKWZLMJcmTbKulCNYiXahPYZKZWYvZZNnNIutTf4j-uf4Sz0wutCpF8G1IziRzOWfmm8v5DsBu4EuaiRhXqpmOPIbDnydzxjyaM66SPMmNst7IH47E4YS9P-Wnl7z4G36IfsPNWoYbr62Bz5Vpxvn2dD-03mr1CKf0RNyEDcERjQ9gY3J0PP7cE3xyF5XTnrPiqpmynqH0cuarc9IfQPM2bK7KuVx_k0VxFcO6SejgLsiu-M3dk6-jVZ2N8u-_MTv-T_3uwZ0WoZJxo1L34YYut2B7XOLqfLYmL4m7M-o247fg1l6X2tzvIsdtQ_22IhfnP3D8LNZkr1qUF-c_SYUdKkvieCHOyiWxbk-zrNCNJBpUL9mJvCFj4thvib2cPyU1qp0mSus5aWNdTIksptXirP4yWz6AycG7k_1Dr43u4OU0ocKToYiYkib0EyWUyRD50FAG2o8QwSTGcC1kIGVMjclw3RVQnkcBppQIlDsffAiDsir1YyA5NxE-R7CVcaZZnCAmS4xvfC2NUJQP4VXXv2neUp_bCBxF2pA2h6mtWeoaeggvetl5Q_jxV6mdTk3S1uiXqaWu4wjH4ngIz_vX2Pb2DEaWulpZGd9iVBYkQ3jUqFf_Gxr5WPIIc-_2-nZtGV47_blGJD3-9PHEpZ782zd3YFAvVvopAq06e9ba0i-Zhy16
  priority: 102
  providerName: Unpaywall
Title Do “Newly Born” orphan proteins resemble “Never Born” proteins? A study using three deep learning algorithms
URI https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fprot.26496
https://www.ncbi.nlm.nih.gov/pubmed/37092778
https://www.proquest.com/docview/2835517588
https://www.proquest.com/docview/2805514419
https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/prot.26496
UnpaywallVersion publishedVersion
Volume 91
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVWIB
  databaseName: Wiley Online Library - Core collection (SURFmarket)
  issn: 0887-3585
  databaseCode: DR2
  dateStart: 19960101
  customDbUrl:
  isFulltext: true
  eissn: 1097-0134
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006936
  providerName: Wiley-Blackwell
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS9xAFD5YpVgfpNW2brUypVJoIZrLZJIBoaxakULtIi7YpzBJZrZCdrLsZin75g9p_5y_xDOTi0iL0JcwJCdkci4z39y-A7DnuSJIWYwj1VRGDsXmzxEZpU6Q0TDnGc9Ubk4jfztnZ0P69Sq8WoLD9ixMzQ_RTbiZyLDttQlwkc4O7klDDY_BPnbnnD2BFQ-BjPFvnw66dphxmyCwDiNExR05qX9w_-7D7ugvjLkGq3M9EYtfoigewlfb_5w-h_UGOJJ-bekXsCT1Bmz2NQ6axwvygditnHaOfAOeHrWl1eM2odsmVCclub35jc1asSBH5VTf3vwhJepZaGLpGq71jJjTSOO0kLUk-nkn2Yp8Jn1iSWmJ2TM_IhV6gyS5lBPSpKAYEVGMyul19XM8ewnD0y-Xx2dOk3TByQJUmCN8FtFcKN_lOctVioAk8IUn3QiBBVcqlEx4QsSBUikOh7wgzCIPSznzcrts9wqWdanlFpAsVBHeRwyUhlTSmCNU4spVrhSK5UHYg4-t7pOsYSQ3iTGKpOZS9hPzZ4m1Uw_ed7KTmofjn1I7rQmTJhZniWGUCxElxXEP3nWPUfdmaURoWc6NjGugI_V4D17Xpu8-E0Qu1jzCt_c6X3i0Dp-smzwikgwuvl_a0pv_Ed6GZybZfb39cAeWq-lcvkVIVKW71vPxenLh78LK8HzQ_3EHPU0NkA
linkProvider Wiley-Blackwell
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3bbtQwEB1BEdrygKCFslDAiAoJpLS5OE78hLaFaoG2VGgr9S1yYntbKZusdrNC-9YPaX-uX8LYuVQVqBJvVjJWHHtmfMaXMwBbniuClMUYqaYqcii6P0dklDpBRkPJM55paW4jHx6x4Qn9fhqeNmdzzF2Ymh-iW3AzlmH9tTFwsyC9c8MaaogMtnE-5-w-PKDMYyb28ulx54gZtxkCaztCWNyxk_o7N3Vvz0d_gcxH0FsUU7H8LfL8Nn61E9D-E3jcIEcyqIf6KdxTxRqsDwqMmidL8oHYs5x2kXwNHu62pd5em9FtHaovJbm-uES_li_Jbjkrri-uSIkdLQpi-RrOizkx15Emaa5qSVT0TrIV-UwGxLLSEnNofkwqVAdFpFJT0uSgGBORj8vZeXU2mT-Dk_2vo72h02RdcLIAO8wRPouoFNp3uWRSp4hIAl94yo0QWXCtQ8WEJ0QcaJ1iPOQFYRZ5WJLMk3bf7jmsFGWhXgDJQh3hcwRBaUgVjTliJa5d7SqhmQzCPnxs-z7JGkpykxkjT2oyZT8xf5bYcerD-052WhNx_FNqsx3CpDHGeWIo5UKESXHch3fda-x7szciClUujIxrsCP1eB826qHvPhNELrY8wtpbnS7c2YZPVk3uEEmOf_0c2dLL_xF-C73h6PAgOfh29OMVrPqIt-qziJuwUs0W6jXioyp9Y63gD_GnDmM
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFD6CIRh74LIB6xhgxDQJpHS5OBc_oW6lGrcxTZu0t8iJ7TKRJlWbCpWn_ZDx5_ZLOHYu1QBNgjcr-aLE8Tn2Z_v4OwBbjs29JIhwpprI0KLY_Vk8pdTyUuoLlrJUCX0a-fNBsH9CP5z6p3Vsjj4LU-lDtAtu2jNMf60dXI6F2lmohmohgy6O5yy4CbeozyId0dc_WqhHBcxkCKz8CGlxq07q7iyevToe_UEyV2B5lo_5_DvPsqv81QxAg_tVltWp0S3UcSffurMy6aY_flN1_O-6PYB7NTUlvcqWHsINma_CWi_HafloTraJCRY1q_CrcHu3KS3vNSnj1qDsF-Ty_AI7zmxOdotJfnn-kxTYkjwnRhDiLJ8Sfd5plGSyQqIntcgG8pb0iJG9JToqf0hKtDdJhJRjUie5GBKeDYvJWfl1NH0EJ4N3x3v7Vp3WwUo9bBGLu0FIBVeuzUQgVIKUx3O5I-0QqQtTypcBdziPPKUSnHA5np-GDpZE4AizMfgYlvIil-tAUl-FeB1ZVuJTSSOGZIwpW9mSq0B4fgdeN40bp7XmuU69kcWVWrMb65rF5kd34FWLHVdKH39FbTY2EtfePo21Zp2PPCyKOvCyvY3_Xm--8FwWM42xNTmlDuvAk8q22td4oY1fHuLTW62xXfsNb4zxXAOJD4--HJvSxr-AX8Cdw_4g_vT-4ONTuOsin6tiHTdhqZzM5DPkX2Xy3HjZL2Q_Lyw
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1ba9RAFD7oFik-eGm9rFQZsQgKWZLMJcmTbKulCNYiXahPYZKZWYvZZNnNIutTf4j-uf4Sz0wutCpF8G1IziRzOWfmm8v5DsBu4EuaiRhXqpmOPIbDnydzxjyaM66SPMmNst7IH47E4YS9P-Wnl7z4G36IfsPNWoYbr62Bz5Vpxvn2dD-03mr1CKf0RNyEDcERjQ9gY3J0PP7cE3xyF5XTnrPiqpmynqH0cuarc9IfQPM2bK7KuVx_k0VxFcO6SejgLsiu-M3dk6-jVZ2N8u-_MTv-T_3uwZ0WoZJxo1L34YYut2B7XOLqfLYmL4m7M-o247fg1l6X2tzvIsdtQ_22IhfnP3D8LNZkr1qUF-c_SYUdKkvieCHOyiWxbk-zrNCNJBpUL9mJvCFj4thvib2cPyU1qp0mSus5aWNdTIksptXirP4yWz6AycG7k_1Dr43u4OU0ocKToYiYkib0EyWUyRD50FAG2o8QwSTGcC1kIGVMjclw3RVQnkcBppQIlDsffAiDsir1YyA5NxE-R7CVcaZZnCAmS4xvfC2NUJQP4VXXv2neUp_bCBxF2pA2h6mtWeoaeggvetl5Q_jxV6mdTk3S1uiXqaWu4wjH4ngIz_vX2Pb2DEaWulpZGd9iVBYkQ3jUqFf_Gxr5WPIIc-_2-nZtGV47_blGJD3-9PHEpZ782zd3YFAvVvopAq06e9ba0i-Zhy16
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Do+%E2%80%9CNewly+Born%E2%80%9D+orphan+proteins+resemble+%E2%80%9CNever+Born%E2%80%9D+proteins%3F+A+study+using+three+deep+learning+algorithms&rft.jtitle=Proteins%2C+structure%2C+function%2C+and+bioinformatics&rft.au=Liu%2C+Jing&rft.au=Yuan%2C+Rongqing&rft.au=Shao%2C+Wei&rft.au=Wang%2C+Jitong&rft.date=2023-08-01&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.issn=0887-3585&rft.eissn=1097-0134&rft.volume=91&rft.issue=8&rft.spage=1097&rft.epage=1115&rft_id=info:doi/10.1002%2Fprot.26496&rft.externalDBID=10.1002%252Fprot.26496&rft.externalDocID=PROT26496
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0887-3585&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0887-3585&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0887-3585&client=summon