蛋白质折叠类型的分类建模与识别
蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为9...
Saved in:
| Published in | Wuli huaxue xuebao Vol. 25; no. 12; pp. 2558 - 2564 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | Chinese |
| Published |
2009
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1000-6818 |
Cover
| Abstract | 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建立统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础. |
|---|---|
| AbstractList | O641; 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列.结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独市建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集.36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建市统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础. 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以占Astral 1.65序列数据库中α,β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象,选取其中序列一致性小于25%的样本作为训练集,以均方根偏差(RMSD)为指标分别进行系统聚类,生成若干折叠子类,并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM).将Astral 1.65中序列一致性小于95%的9505个样本作为检验集,36个折叠类型的平均识别敏感性为90%,特异性为99%,马修斯相关系数(MCC)为0.95.结果表明:对于成员较多,无法建立统一模型的折叠类型,基于RMSD的系统分类建模均可实现较高准确率的识别,为蛋白质折叠识别拓展了新的方法和思路,为进一步研究奠定了基础. |
| Abstract_FL | The mechanism of how protein amino acid sequences determine protein structure is a core issue in biology. The protein fold type reflects the topological pattern of the structure's core. Fold recognition is an important method in protein sequence-structure research. This article focuses on the 36 fold types that are not incorporated into the unified hidden Markov model (HMM) model but that account for 41.8% of α,β, and α/β protein's in the Astral 1.65 sequence database. The training set contains samples that have less than 25% sequence identity with each other. We applied the hierarchical clustering method according to root mean square deviation (RMSD) and fold subgroups were generated. A profile-HMM based on a multiple structural alignment algorithm (MUSTANG) structure alignment was then built for each subgroup. After testing 9505 proteins with less than 95% sequence identity from the Astral 1.65 database, the average sensitivity, specificity and Matthew's correlation coefficient (MCC) of the 36 fold types were found to be 90%, 99% and 0.95, respectively. These results show that classification modeling according to RMSD is able to achieve precise fold recognition while a unified HMM cannot be built because there are too many elements in the training set. We have developed a new method and novel ideas to enable profile-HMM protein fold recognition and have laid the foundation for further research. |
| Author | 刘岳 李晓琴 徐海松 乔辉 |
| AuthorAffiliation | 北京工业大学生命科学与生物工程学院,北京100124 |
| Author_xml | – sequence: 1 fullname: 刘岳 李晓琴 徐海松 乔辉 |
| BookMark | eNotjbtKA0EYRqeIYIx5CRurhblnppTgDQI26Zd_dmeS1WUWXSR5AAnBgLZeQLQLpAiChRh8Gze7j-FKrL5THM63gxo-87aBmgRjHEhF1DZq53liMCYECypVEwXV86x8_K4-5uvbh-L-tXxfFS-z8ummmE7-ePW1nr_9fN5Vy0kxXeyiLQdpbtv_20L9o8N-9yTonR2fdg96QSS0DDTFmFMhrRVWE8MJgIsNp44RZZRyNGIijoUBrIhhMacd15EggBMJRhtgLbS_yY7AO_CD8Dy7vvL1YThKh-OxqfuaUExlbe5tzGiY-cFlUrsGoguXpDZklEvFlWK_thBb3A |
| ClassificationCodes | O641 |
| ContentType | Journal Article |
| Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| DBID | 2RA 92L CQIGP W94 ~WA 2B. 4A8 92I 93N PSX TCJ |
| DatabaseName | 维普期刊资源整合服务平台 中文科技期刊数据库-CALIS站点 中文科技期刊数据库-7.0平台 中文科技期刊数据库-自然科学 中文科技期刊数据库- 镜像站点 Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Chemistry |
| DocumentTitleAlternate | Classification Modeling and Recognition of Protein Fold Type |
| DocumentTitle_FL | Classification Modeling and Recognition of Protein Fold Type |
| EndPage | 2564 |
| ExternalDocumentID | wlhxxb200912026 32468488 |
| GrantInformation_xml | – fundername: 国家自然科学基金; 北京市自然科学基金 funderid: (30570427); (4092008) |
| GroupedDBID | -02 2B. 2C. 2RA 5XA 5XC 92E 92I 92L ACGFS AENEX ALMA_UNASSIGNED_HOLDINGS CCEZO CDRFL CDYEO CLXHM CQIGP CW9 EBS EJD FIJ OK1 P2P RIG TCJ TGP U1G U5L W94 ~WA 4A8 93N AAXUO AAYWO ADMLS FDB M41 PSX ROL UY8 |
| ID | FETCH-LOGICAL-c596-92004256ee5e91b41aafdb42f318b88f2c35dd5ba081b3d427f76a5a416ab9ba3 |
| ISSN | 1000-6818 |
| IngestDate | Thu May 29 03:54:34 EDT 2025 Fri Nov 25 17:02:01 EST 2022 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Keywords | 隐马尔科夫模型 折叠识别 均方根偏差 蛋白质折叠类型 RMSD,Hierarchical clustering 系统聚类 Profile-HMM,Fold recognition Protein fold type |
| Language | Chinese |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c596-92004256ee5e91b41aafdb42f318b88f2c35dd5ba081b3d427f76a5a416ab9ba3 |
| Notes | 11-1892/06 O641 Protein fold type RMSD Hierarchical clustering Profile-HMM Fold recognition |
| PageCount | 7 |
| ParticipantIDs | wanfang_journals_wlhxxb200912026 chongqing_backfile_32468488 |
| PublicationCentury | 2000 |
| PublicationDate | 2009 |
| PublicationDateYYYYMMDD | 2009-01-01 |
| PublicationDate_xml | – year: 2009 text: 2009 |
| PublicationDecade | 2000 |
| PublicationTitle | Wuli huaxue xuebao |
| PublicationTitleAlternate | Acta Physico-Chimica Sinica |
| PublicationTitle_FL | ACTA PHYSICO-CHIMICA SINICA |
| PublicationYear | 2009 |
| SSID | ssib001105268 ssj0030168 ssib024507715 ssib002258135 ssib057925156 ssib051374152 |
| Score | 1.8397354 |
| Snippet | 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列-结构研究的重要内容.我们以... O641; 蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一.折叠类型反映了蛋白质核心结构的拓扑模式,折叠识别是蛋白质序列.结构研究的重要内容.我们... |
| SourceID | wanfang chongqing |
| SourceType | Aggregation Database Publisher |
| StartPage | 2558 |
| SubjectTerms | 均方根偏差 折叠识别 系统聚类 蛋白质折叠类型 隐马尔科夫模型 |
| Title | 蛋白质折叠类型的分类建模与识别 |
| URI | http://lib.cqvip.com/qk/92644X/200912/32468488.html https://d.wanfangdata.com.cn/periodical/wlhxxb200912026 |
| Volume | 25 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVALS databaseName: IngentaConnect Open Access Journals issn: 1000-6818 databaseCode: FIJ dateStart: 20080115 customDbUrl: isFulltext: true dateEnd: 20150615 titleUrlDefault: http://www.ingentaconnect.com/content/title?j_type=online&j_startat=Aa&j_endat=Af&j_pagesize=200&j_page=1 omitProxy: true ssIdentifier: ssj0030168 providerName: Ingenta |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFB-WXvQi1g-sVdmDcworO8lkMnNrps1SC3pasbdlJh9dsWSpbnDpWUqxoFc_QPRW8CCCB7H437jd_TN8Mzv7URBRDxkmk5eXN3mZN7-X5L1B6HaYBVwTBb4JT_MGJRqGlEkIwKOoAHicUWpXibh3n20-oFvb4XattrYYXdLXd9L938aV_I9WoQ30aqJk_0GzM6bQAHXQL5SgYSj_Ssc44VhIzCVOIiwElhumRVIcc5wwzGMsoBJi3sJx09BIgqU0LSKZnhVjTi0Nx5ydoYFSxoYPcIsJTiiWQJOYS8QtS2zPiuUiwH1Y7T7yupUaVLkHm1a9qUod-UQi6WMZeIa52LA8mZFfBFakJnTBs1SJ2YFjEuqRo5aJZ2UBcmq7C10RZ15eiAVLa0LaGZ8b32r2Q7UzpeEkp7ublgGa0fmcNfuTEPAg42CEYBIGqGW879bdrTkeJDadzQJeCTmZ40OfAhqOyMxfAmPnYiedbCbhRrdX7uwBjLBRXWWhyp0FANK-iC44z6EeTx6DZVTb715C59anC_ZdRo3xu6PRmx_jr8enL14PX30YfTkZvj8avX0-PDww9ZPvp8cff357Of58MDz8dAW1W0l7fbPhVsNopKFgDeFb-8ryPMwF0ZQoVWSa-gUYZc154adBmGWhVoDxdJBRPyoipkIFgFtpoVVwFS2VvTK_huoACrNIqTwwfHxa8CAnqWJq8k2ONVfQ6qzXAKbSxyZFWGd6p1dQ3d2HjhsKTzvPdruDgTYKJj449df_yGAVnZ98kjPvsW6gpf6TKr8JyK6vb1nt_QJF6DxB |
| linkProvider | Ingenta |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E8%9B%8B%E7%99%BD%E8%B4%A8%E6%8A%98%E5%8F%A0%E7%B1%BB%E5%9E%8B%E7%9A%84%E5%88%86%E7%B1%BB%E5%BB%BA%E6%A8%A1%E4%B8%8E%E8%AF%86%E5%88%AB&rft.jtitle=Wuli+huaxue+xuebao&rft.au=%E5%88%98%E5%B2%B3+%E6%9D%8E%E6%99%93%E7%90%B4+%E5%BE%90%E6%B5%B7%E6%9D%BE+%E4%B9%94%E8%BE%89&rft.date=2009&rft.issn=1000-6818&rft.issue=12&rft.spage=2558&rft.epage=2564&rft.externalDocID=32468488 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F92644X%2F92644X.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fwlhxxb%2Fwlhxxb.jpg |