蛋白质折叠类型的分类建模与识别

蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为9...

Full description

Saved in:
Bibliographic Details
Published inWuli huaxue xuebao Vol. 25; no. 12; pp. 2558 - 2564
Main Author 刘岳 李晓琴 徐海松 乔辉
Format Journal Article
LanguageChinese
Published 2009
Subjects
Online AccessGet full text
ISSN1000-6818

Cover

Abstract 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建立统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础.
AbstractList O641; 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列.结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独市建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集.36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建市统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础.
蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建立统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础.
Abstract_FL The mechanism of how protein amino acid sequences determine protein structure is a core issue in biology. The protein fold type reflects the topological pattern of the structure's core. Fold recognition is an important method in protein sequence-structure research. This article focuses on the 36 fold types that are not incorporated into the unified hidden Markov model (HMM) model but that account for 41.8% of α,β, and α/β protein's in the Astral 1.65 sequence database. The training set contains samples that have less than 25% sequence identity with each other. We applied the hierarchical clustering method according to root mean square deviation (RMSD) and fold subgroups were generated. A profile-HMM based on a multiple structural alignment algorithm (MUSTANG) structure alignment was then built for each subgroup. After testing 9505 proteins with less than 95% sequence identity from the Astral 1.65 database, the average sensitivity, specificity and Matthew's correlation coefficient (MCC) of the 36 fold types were found to be 90%, 99% and 0.95, respectively. These results show that classification modeling according to RMSD is able to achieve precise fold recognition while a unified HMM cannot be built because there are too many elements in the training set. We have developed a new method and novel ideas to enable profile-HMM protein fold recognition and have laid the foundation for further research.
Author 刘岳 李晓琴 徐海松 乔辉
AuthorAffiliation 北京工业大学生命科学与生物工程学院,北京100124
Author_xml – sequence: 1
  fullname: 刘岳 李晓琴 徐海松 乔辉
BookMark eNotjbtKA0EYRqeIYIx5CRurhblnppTgDQI26Zd_dmeS1WUWXSR5AAnBgLZeQLQLpAiChRh8Gze7j-FKrL5THM63gxo-87aBmgRjHEhF1DZq53liMCYECypVEwXV86x8_K4-5uvbh-L-tXxfFS-z8ummmE7-ePW1nr_9fN5Vy0kxXeyiLQdpbtv_20L9o8N-9yTonR2fdg96QSS0DDTFmFMhrRVWE8MJgIsNp44RZZRyNGIijoUBrIhhMacd15EggBMJRhtgLbS_yY7AO_CD8Dy7vvL1YThKh-OxqfuaUExlbe5tzGiY-cFlUrsGoguXpDZklEvFlWK_thBb3A
ClassificationCodes O641
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W94
~WA
2B.
4A8
92I
93N
PSX
TCJ
DatabaseName 维普期刊资源整合服务平台
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库-自然科学
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
DocumentTitleAlternate Classification Modeling and Recognition of Protein Fold Type
DocumentTitle_FL Classification Modeling and Recognition of Protein Fold Type
EndPage 2564
ExternalDocumentID wlhxxb200912026
32468488
GrantInformation_xml – fundername: 国家自然科学基金; 北京市自然科学基金
  funderid: (30570427); (4092008)
GroupedDBID -02
2B.
2C.
2RA
5XA
5XC
92E
92I
92L
ACGFS
AENEX
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CDRFL
CDYEO
CLXHM
CQIGP
CW9
EBS
EJD
FIJ
OK1
P2P
RIG
TCJ
TGP
U1G
U5L
W94
~WA
4A8
93N
AAXUO
AAYWO
ADMLS
FDB
M41
PSX
ROL
UY8
ID FETCH-LOGICAL-c596-92004256ee5e91b41aafdb42f318b88f2c35dd5ba081b3d427f76a5a416ab9ba3
ISSN 1000-6818
IngestDate Thu May 29 03:54:34 EDT 2025
Fri Nov 25 17:02:01 EST 2022
IsPeerReviewed true
IsScholarly true
Issue 12
Keywords 隐马尔科夫模型
折叠识别
均方根偏差
蛋白质折叠类型
RMSD,Hierarchical clustering
系统聚类
Profile-HMM,Fold recognition
Protein fold type
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c596-92004256ee5e91b41aafdb42f318b88f2c35dd5ba081b3d427f76a5a416ab9ba3
Notes 11-1892/06
O641
Protein fold type RMSD Hierarchical clustering Profile-HMM Fold recognition
PageCount 7
ParticipantIDs wanfang_journals_wlhxxb200912026
chongqing_backfile_32468488
PublicationCentury 2000
PublicationDate 2009
PublicationDateYYYYMMDD 2009-01-01
PublicationDate_xml – year: 2009
  text: 2009
PublicationDecade 2000
PublicationTitle Wuli huaxue xuebao
PublicationTitleAlternate Acta Physico-Chimica Sinica
PublicationTitle_FL ACTA PHYSICO-CHIMICA SINICA
PublicationYear 2009
SSID ssib001105268
ssj0030168
ssib024507715
ssib002258135
ssib057925156
ssib051374152
Score 1.8397354
Snippet 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以...
O641; 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列.结构研究的重要内容.我们...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 2558
SubjectTerms 均方根偏差
折叠识别
系统聚类
蛋白质折叠类型
隐马尔科夫模型
Title 蛋白质折叠类型的分类建模与识别
URI http://lib.cqvip.com/qk/92644X/200912/32468488.html
https://d.wanfangdata.com.cn/periodical/wlhxxb200912026
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVALS
  databaseName: IngentaConnect Open Access Journals
  issn: 1000-6818
  databaseCode: FIJ
  dateStart: 20080115
  customDbUrl:
  isFulltext: true
  dateEnd: 20150615
  titleUrlDefault: http://www.ingentaconnect.com/content/title?j_type=online&j_startat=Aa&j_endat=Af&j_pagesize=200&j_page=1
  omitProxy: true
  ssIdentifier: ssj0030168
  providerName: Ingenta
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFB-WXvQi1g-sVdmDcworO8lkMnNrps1SC3pasbdlJh9dsWSpbnDpWUqxoFc_QPRW8CCCB7H437jd_TN8Mzv7URBRDxkmk5eXN3mZN7-X5L1B6HaYBVwTBb4JT_MGJRqGlEkIwKOoAHicUWpXibh3n20-oFvb4XattrYYXdLXd9L938aV_I9WoQ30aqJk_0GzM6bQAHXQL5SgYSj_Ssc44VhIzCVOIiwElhumRVIcc5wwzGMsoBJi3sJx09BIgqU0LSKZnhVjTi0Nx5ydoYFSxoYPcIsJTiiWQJOYS8QtS2zPiuUiwH1Y7T7yupUaVLkHm1a9qUod-UQi6WMZeIa52LA8mZFfBFakJnTBs1SJ2YFjEuqRo5aJZ2UBcmq7C10RZ15eiAVLa0LaGZ8b32r2Q7UzpeEkp7ublgGa0fmcNfuTEPAg42CEYBIGqGW879bdrTkeJDadzQJeCTmZ40OfAhqOyMxfAmPnYiedbCbhRrdX7uwBjLBRXWWhyp0FANK-iC44z6EeTx6DZVTb715C59anC_ZdRo3xu6PRmx_jr8enL14PX30YfTkZvj8avX0-PDww9ZPvp8cff357Of58MDz8dAW1W0l7fbPhVsNopKFgDeFb-8ryPMwF0ZQoVWSa-gUYZc154adBmGWhVoDxdJBRPyoipkIFgFtpoVVwFS2VvTK_huoACrNIqTwwfHxa8CAnqWJq8k2ONVfQ6qzXAKbSxyZFWGd6p1dQ3d2HjhsKTzvPdruDgTYKJj449df_yGAVnZ98kjPvsW6gpf6TKr8JyK6vb1nt_QJF6DxB
linkProvider Ingenta
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E8%9B%8B%E7%99%BD%E8%B4%A8%E6%8A%98%E5%8F%A0%E7%B1%BB%E5%9E%8B%E7%9A%84%E5%88%86%E7%B1%BB%E5%BB%BA%E6%A8%A1%E4%B8%8E%E8%AF%86%E5%88%AB&rft.jtitle=Wuli+huaxue+xuebao&rft.au=%E5%88%98%E5%B2%B3+%E6%9D%8E%E6%99%93%E7%90%B4+%E5%BE%90%E6%B5%B7%E6%9D%BE+%E4%B9%94%E8%BE%89&rft.date=2009&rft.issn=1000-6818&rft.issue=12&rft.spage=2558&rft.epage=2564&rft.externalDocID=32468488
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F92644X%2F92644X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fwlhxxb%2Fwlhxxb.jpg