T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm

Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evol...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 25; no. 20; pp. 2632 - 2638
Main Authors	Jorda, Julien, Kajava, Andrey V.
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 15.10.2009 Oxford University Press (OUP)
Subjects	Algorithms Amino Acid Sequence Base Sequence Biochemistry, Molecular Biology Biological and medical sciences Computational Biology - methods Databases, Genetic Databases, Protein Fundamental and applied biological sciences. Psychology General aspects Life Sciences Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Proteins - chemistry Repetitive Sequences, Amino Acid Sequence Analysis, Protein - methods Tandem Repeat Sequences K means algorithm Identification Tandemly repeated sequence Repeated sequence
Online Access	Get full text
ISSN	1367-4803 1367-4811 1460-2059 1367-4811
DOI	10.1093/bioinformatics/btp482

Cover

Abstract	Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online.
AbstractList	Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online. Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online. MOTIVATION: Over the last years a number of evidences has been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. RESULTS: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. AVAILABILITY: The algorithm has been implemented in JAVA, the program is available upon request at bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at bioinfo.montp.cnrs.fr/?r=repeatDB. CONTACT: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr. Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins.MOTIVATIONOver the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins.We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.RESULTSWe developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.AVAILABILITYThe algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online.
Author	Kajava, Andrey V. Jorda, Julien
Author_xml	– sequence: 1 givenname: Julien surname: Jorda fullname: Jorda, Julien organization: To whom correspondence should be addressed – sequence: 2 givenname: Andrey V. surname: Kajava fullname: Kajava, Andrey V. organization: To whom correspondence should be addressed
BackLink	http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=22005583$$DView record in Pascal Francis https://www.ncbi.nlm.nih.gov/pubmed/19671691$$D View this record in MEDLINE/PubMed https://hal.science/hal-00423755$$DView record in HAL
BookMark	eNqNkUtv1DAURiNURB_wE0DeIMQi1K84CV1V1ZRBHQlpZpAQG-vGsakhsYOd0Pbf4yGjVrChK1v2Ode-3z3ODpx3OsteEvyO4JqdNtZbZ3zoYbQqnjbjwCv6JDsiXOCc4qI-SHsmypxXmB1mxzF-x7ggnPNn2SGpRUlETY6y7TZfL64275FttRutsSrV8w55g7bgWt2j9WLQMEZkHYr656Sd0hHd2PEaAbrKew1ugxqIukXQffMhXfTPs6cGuqhf7NeT7PPlYnuxzFefPny8OF_litfVmHNcES24UBXTwtQNNxgKyhnVtQDGDDekrCralrVmWpWmqU1RUkVow4HWpWAnmZjrTm6AuxvoOjkE20O4kwTLXUzy75jkHFMS387iNTwoHqxcnq_k7gxjTllZFL9IYt_M7BB8aj-OsrdR6a4Dp_0UZcl2fXCGE_lqT05Nr9uHv-zTTsDrPQBRQWcCOGXjPUdpmlBRscQVM6eCjzFo8-i2zv7xlB3_zHMMYLv_2ni2_TQ8-sF8Vmwc9e29BOGHFGXKTy6_fJXVhq_YZk0lZ78BZ0_bZw
CitedBy_id	crossref_primary_10_2174_2212796813666190207143223 crossref_primary_10_1371_journal_pone_0141729 crossref_primary_10_1093_dnares_dsad007 crossref_primary_10_4018_IJKDB_2017070101 crossref_primary_10_1002_bies_201100034 crossref_primary_10_1080_00275514_2020_1716566 crossref_primary_10_1093_gbe_evz059 crossref_primary_10_1016_j_jmb_2021_166895 crossref_primary_10_1093_mp_sst006 crossref_primary_10_1039_C7MB00412E crossref_primary_10_1002_pro_2907 crossref_primary_10_3389_fpls_2017_00124 crossref_primary_10_3390_genes11040407 crossref_primary_10_1016_j_heliyon_2022_e10516 crossref_primary_10_3390_pathogens10111520 crossref_primary_10_1007_s12551_023_01130_0 crossref_primary_10_3390_e24050632 crossref_primary_10_3390_ijms22137096 crossref_primary_10_3389_fmicb_2020_560667 crossref_primary_10_1186_s12864_016_2422_y crossref_primary_10_1093_bib_bbac299 crossref_primary_10_1093_bioinformatics_btaa121 crossref_primary_10_1007_s12038_020_00058_x crossref_primary_10_3390_biom12111610 crossref_primary_10_1093_nar_gky890 crossref_primary_10_1111_j_1742_4658_2010_07684_x crossref_primary_10_1016_j_jsb_2017_10_001 crossref_primary_10_1021_acs_jctc_1c00129 crossref_primary_10_1016_j_celrep_2022_111347 crossref_primary_10_1128_AEM_01592_13 crossref_primary_10_1016_j_gene_2010_02_007 crossref_primary_10_1016_j_jda_2012_12_007 crossref_primary_10_1093_bioadv_vbae149 crossref_primary_10_1186_s12859_014_0440_9 crossref_primary_10_1186_1471_2164_14_270 crossref_primary_10_1038_nsmb_3441 crossref_primary_10_1534_g3_116_031880 crossref_primary_10_1073_pnas_2101349118 crossref_primary_10_1186_s13071_017_2185_6 crossref_primary_10_1134_S0026893321040038 crossref_primary_10_1080_21505594_2015_1112491 crossref_primary_10_3389_fpls_2015_01218 crossref_primary_10_1155_2017_7949287 crossref_primary_10_1093_nar_gkt628 crossref_primary_10_1007_s12551_023_01140_y crossref_primary_10_1186_1471_2229_12_207 crossref_primary_10_1007_s00726_016_2187_2 crossref_primary_10_1093_gbe_evu251 crossref_primary_10_1186_1475_2875_13_495 crossref_primary_10_1007_s00253_019_09639_9 crossref_primary_10_1016_j_jsb_2014_03_013 crossref_primary_10_1016_j_compbiolchem_2018_12_015 crossref_primary_10_1042_BST20150073 crossref_primary_10_1371_journal_pone_0260830 crossref_primary_10_1039_C5MB00273G crossref_primary_10_1111_nph_13184 crossref_primary_10_3390_plants13182619 crossref_primary_10_1093_femsec_fiv120 crossref_primary_10_1073_pnas_2020885118 crossref_primary_10_1142_S0219720014420098 crossref_primary_10_1155_2018_4028417 crossref_primary_10_1111_jeu_12820 crossref_primary_10_1007_s00572_021_01066_x crossref_primary_10_1074_jbc_M113_471805 crossref_primary_10_1016_j_micres_2023_127437 crossref_primary_10_1186_1471_2105_13_S3_S8 crossref_primary_10_1134_S160767291706014X crossref_primary_10_1128_jb_00107_22 crossref_primary_10_3389_fbioe_2019_00250 crossref_primary_10_1093_bib_bbs023 crossref_primary_10_3390_microorganisms11092256 crossref_primary_10_1515_sagmb_2015_0079 crossref_primary_10_4137_EBO_S9248 crossref_primary_10_3389_fgene_2024_1474611 crossref_primary_10_1016_j_molliq_2020_113618 crossref_primary_10_3389_fbioe_2015_00143 crossref_primary_10_3389_fnmol_2019_00274 crossref_primary_10_1371_journal_pgen_1004078 crossref_primary_10_1186_s12864_019_5536_1 crossref_primary_10_26508_lsa_202201677 crossref_primary_10_1038_s41396_018_0223_9 crossref_primary_10_1088_1742_6596_937_1_012013 crossref_primary_10_3390_ijms26052004 crossref_primary_10_3390_biom13071116 crossref_primary_10_1111_1469_0691_12134 crossref_primary_10_1371_journal_pone_0020488 crossref_primary_10_1093_bioinformatics_btu437 crossref_primary_10_1111_j_1742_464X_2010_07684_x crossref_primary_10_1515_jib_2020_0024 crossref_primary_10_1186_s12864_021_07586_2 crossref_primary_10_1016_j_jsb_2019_08_003 crossref_primary_10_2139_ssrn_3985261 crossref_primary_10_3390_genes12040473 crossref_primary_10_3389_fbinf_2021_685844 crossref_primary_10_1093_nar_gks726 crossref_primary_10_1002_pmic_201100534 crossref_primary_10_1074_jbc_M116_761213 crossref_primary_10_1186_s12936_016_1563_4 crossref_primary_10_1016_j_compbiolchem_2013_09_001 crossref_primary_10_7868_S0869565217360245 crossref_primary_10_1038_s42003_023_05322_y crossref_primary_10_1134_S0006297911090082 crossref_primary_10_1371_journal_pone_0209463 crossref_primary_10_1186_s43008_023_00115_8 crossref_primary_10_1002_prot_26726 crossref_primary_10_1016_j_febslet_2015_08_025 crossref_primary_10_1111_nph_14089 crossref_primary_10_1371_journal_pone_0029847 crossref_primary_10_1186_s12859_020_3493_y crossref_primary_10_1038_s41598_020_58342_7 crossref_primary_10_1093_nargab_lqae154 crossref_primary_10_1371_journal_pone_0215912 crossref_primary_10_1016_j_jsb_2011_08_009 crossref_primary_10_1111_1751_7915_12731 crossref_primary_10_1093_molbev_msu062 crossref_primary_10_1016_j_jsb_2023_108002 crossref_primary_10_1093_bioinformatics_btv306 crossref_primary_10_1093_bioinformatics_btt647 crossref_primary_10_1111_jeu_12412 crossref_primary_10_1007_s11517_015_1304_9
Cites_doi	10.1006/jmbi.2000.3684 10.1093/nar/gkh340 10.1006/jmbi.1999.3136 10.1093/bioinformatics/bth335 10.1093/nar/22.22.4673 10.1093/nar/27.2.573 10.1002/j.1538-7305.1950.tb00463.x 10.1093/bioinformatics/bth911 10.1016/S0968-0004(00)01643-1 10.1016/S0968-0004(97)01058-X 10.1016/S0168-9525(00)02024-2 10.1073/pnas.84.13.4355 10.1093/bioinformatics/btl309 10.1110/ps.9.6.1203 10.1007/BF02289588 10.1089/106652701300099038 10.1093/nar/gkg563 10.1186/1471-2105-8-382 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z 10.1093/nar/gkg617 10.1016/S0065-3233(06)73005-4 10.1016/S0065-3233(06)73008-X 10.1016/j.jsb.2006.01.015 10.1016/S0065-3233(06)73001-7 10.1016/S0969-2126(01)00222-2
ContentType	Journal Article
Copyright	The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2009 2009 INIST-CNRS Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2009 – notice: 2009 INIST-CNRS – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	BSCLL AAYXX CITATION IQODW CGR CUY CVF ECM EIF NPM 7X8 1XC ADTOC UNPAY
DOI	10.1093/bioinformatics/btp482
DatabaseName	Istex CrossRef Pascal-Francis Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic Hyper Article en Ligne (HAL) Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	CrossRef MEDLINE - Academic MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1460-2059 1367-4811
EndPage	2638
ExternalDocumentID	10.1093/bioinformatics/btp482 oai:HAL:hal-00423755v1 19671691 22005583 10_1093_bioinformatics_btp482 ark_67375_HXZ_8S4L3SR2_4
Genre	Research Support, Non-U.S. Gov't Journal Article
GroupedDBID	-~X .2P .I3 482 48X 5GY AAMVS ABJNI ABPTD ACGFS ACUFI ADZXQ ALMA_UNASSIGNED_HOLDINGS BSCLL CZ4 EE~ F5P F9B H5~ HAR HW0 IOX KSI KSN NGC Q5Y RD5 ROZ RXO TLC TN5 TOX WH7 ~91 ADRIX BCRHZ KOP ROX --- -E4 .DC 0R~ 1TH 23N 2WC 4.4 53G 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN AAYXX ABEJV ABEUO ABGNP ABIXL ABNGD ABNKS ABPQP ABQLI ABWST ABXVV ABZBJ ACIWK ACPRK ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQPQ AGQXC AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE CITATION COF CS3 DAKXR DIK DILTD DU5 D~K EBD EBS EJD EMOBN FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 HVGLF HZ~ J21 JXSIZ KAQDR KQ8 M-Z MK~ ML0 N9A NLBLG NMDNZ NOMLY NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PB- PEELM PQQKQ Q1. R44 RNS ROL RPM RUSNO RW1 SV3 TEORI TJP TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~KM .-4 .GJ ABEFU AI. AQDSO ATTQO ELUNK IQODW NTWIH O~Y RIG RNI RZF RZO VH1 ZGI ABQTQ CGR CUY CVF ECM EIF M49 NPM 7X8 1XC ADTOC UNPAY
ID	FETCH-LOGICAL-c498t-4081e646c83e6f9b4f0a52432e96a33f4f17882d79e3ec7fb9f572c12b4a29763
IEDL.DBID	UNPAY
ISSN	1367-4803 1367-4811
IngestDate	Wed Oct 01 16:13:39 EDT 2025 Tue Oct 14 20:00:02 EDT 2025 Fri Sep 05 10:08:02 EDT 2025 Thu Apr 03 07:05:25 EDT 2025 Mon Jul 21 09:15:35 EDT 2025 Wed Oct 01 04:04:50 EDT 2025 Thu Apr 24 23:03:14 EDT 2025 Wed Aug 28 03:24:18 EDT 2024 Sat Sep 20 11:01:58 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	20
Keywords	K means algorithm Identification Tandemly repeated sequence Repeated sequence
Language	English
License	CC BY 4.0 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c498t-4081e646c83e6f9b4f0a52432e96a33f4f17882d79e3ec7fb9f572c12b4a29763
Notes	ArticleID:btp482 To whom correspondence should be addressed. Associate Editor: Ivo Hofacker istex:DDA26AF95B10B3F3C4DD135C9B4729948195B4D6 ark:/67375/HXZ-8S4L3SR2-4 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ORCID	0000-0002-2342-6886
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://academic.oup.com/bioinformatics/article-pdf/25/20/2632/48993465/bioinformatics_25_20_2632.pdf
PMID	19671691
PQID	734081430
PQPubID	23479
PageCount	7
ParticipantIDs	unpaywall_primary_10_1093_bioinformatics_btp482 hal_primary_oai_HAL_hal_00423755v1 proquest_miscellaneous_734081430 pubmed_primary_19671691 pascalfrancis_primary_22005583 crossref_primary_10_1093_bioinformatics_btp482 crossref_citationtrail_10_1093_bioinformatics_btp482 oup_primary_10_1093_bioinformatics_btp482 istex_primary_ark_67375_HXZ_8S4L3SR2_4
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2009-10-15
PublicationDateYYYYMMDD	2009-10-15
PublicationDate_xml	– month: 10 year: 2009 text: 2009-10-15 day: 15
PublicationDecade	2000
PublicationPlace	Oxford
PublicationPlace_xml	– name: Oxford – name: England
PublicationTitle	Bioinformatics
PublicationTitleAlternate	Bioinformatics
PublicationYear	2009
Publisher	Oxford University Press Oxford University Press (OUP)
Publisher_xml	– name: Oxford University Press – name: Oxford University Press (OUP)
References	Kajava (2023013112122356900_B12) 2006; 73 Landau (2023013112122356900_B17) 2001; 8 Kolpakov (2023013112122356900_B16) 2003; 31 Heger (2023013112122356900_B10) 2000; 41 Baxa (2023013112122356900_B2) 2006; 73 Johnson (2023013112122356900_B11) 1967; 32 Kajava (2023013112122356900_B13) 2006; 155 Katti (2023013112122356900_B15) 2000; 9 Kajava (2023013112122356900_B14) 1995; 3 Marcotte (2023013112122356900_B20) 1999; 293 Andrade (2023013112122356900_B1) 2000; 298 MacQueen (2023013112122356900_B19) 1967 Sokol (2023013112122356900_B24) 2007; 23 Nelson (2023013112122356900_B21) 2006; 73 George (2023013112122356900_B7) 2000; 25 Lupas (2023013112122356900_B18) 1997; 22 Newman (2023013112122356900_B22) 2007; 8 Rice (2023013112122356900_B23) 2000; 16 Szklarczyk (2023013112122356900_B25) 2004; 20 Benson (2023013112122356900_B3) 1999; 27 Thompson (2023013112122356900_B26) 1994; 22 Delgrange (2023013112122356900_B4) 2004; 20 Gasteiger (2023013112122356900_B6) 2003; 31 Gribskov (2023013112122356900_B8) 1987; 84 Edgar (2023013112122356900_B5) 2004; 32 Hamming (2023013112122356900_B9) 1950; 29
References_xml	– volume: 298 start-page: 521 year: 2000 ident: 2023013112122356900_B1 article-title: Homology-based method for identification of protein repeats using statistical significance estimates publication-title: J. Mol. Biol. doi: 10.1006/jmbi.2000.3684 – volume: 32 start-page: 1792 year: 2004 ident: 2023013112122356900_B5 article-title: MUSCLE: multiple sequence alignment with high accuracy and high throughput publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkh340 – volume: 293 start-page: 151 year: 1999 ident: 2023013112122356900_B20 article-title: A census of protein repeats publication-title: J. Mol. Biol. doi: 10.1006/jmbi.1999.3136 – volume: 20 start-page: 2812 year: 2004 ident: 2023013112122356900_B4 article-title: STAR: an algorithm to search for Tandem Approximate Repeats publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth335 – volume: 22 start-page: 4673 year: 1994 ident: 2023013112122356900_B26 article-title: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice publication-title: Nucleic Acids Res. doi: 10.1093/nar/22.22.4673 – volume: 27 start-page: 573 year: 1999 ident: 2023013112122356900_B3 article-title: Tandem repeats finder: a program to analyze DNA sequences publication-title: Nucleic Acids Res. doi: 10.1093/nar/27.2.573 – volume: 29 start-page: 147 year: 1950 ident: 2023013112122356900_B9 article-title: Error detecting and error correcting codes publication-title: Bell System Technical J. doi: 10.1002/j.1538-7305.1950.tb00463.x – volume: 20 start-page: i311 issue: Suppl. 1 year: 2004 ident: 2023013112122356900_B25 article-title: Tracking repeats using significance and transitivity publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth911 – volume: 25 start-page: 515 year: 2000 ident: 2023013112122356900_B7 article-title: The REPRO server: finding protein internal sequence repeats through the Web publication-title: Trends Biochem Sci. doi: 10.1016/S0968-0004(00)01643-1 – volume: 22 start-page: 195 year: 1997 ident: 2023013112122356900_B18 article-title: A repetitive sequence in subunits of the 26S proteasome and 20S cyclosome (anaphase-promoting complex) publication-title: Trends Biochem Sci. doi: 10.1016/S0968-0004(97)01058-X – volume: 16 start-page: 276 year: 2000 ident: 2023013112122356900_B23 article-title: EMBOSS: the European Molecular Biology Open Software Suite publication-title: Trends Genet. doi: 10.1016/S0168-9525(00)02024-2 – volume: 84 start-page: 4355 year: 1987 ident: 2023013112122356900_B8 article-title: Profile analysis: detection of distantly related proteins publication-title: Proc. Natl Acad. Sci. USA doi: 10.1073/pnas.84.13.4355 – volume-title: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. year: 1967 ident: 2023013112122356900_B19 article-title: Some methods for classification and analysis of multivariate observations – volume: 23 start-page: e30 year: 2007 ident: 2023013112122356900_B24 article-title: Tandem repeats over the edit distance publication-title: Bioinformatics doi: 10.1093/bioinformatics/btl309 – volume: 9 start-page: 1203 year: 2000 ident: 2023013112122356900_B15 article-title: Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications publication-title: Protein Sci. doi: 10.1110/ps.9.6.1203 – volume: 32 start-page: 241 year: 1967 ident: 2023013112122356900_B11 article-title: Hierarchical clustering schemes publication-title: Psychometrika doi: 10.1007/BF02289588 – volume: 8 start-page: 1 year: 2001 ident: 2023013112122356900_B17 article-title: An algorithm for approximate tandem repeats publication-title: J. Comput. Biol. doi: 10.1089/106652701300099038 – volume: 31 start-page: 3784 year: 2003 ident: 2023013112122356900_B6 article-title: ExPASy: The proteomics server for in-depth protein knowledge and analysis publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkg563 – volume: 8 start-page: 382 year: 2007 ident: 2023013112122356900_B22 article-title: XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-8-382 – volume: 41 start-page: 224 year: 2000 ident: 2023013112122356900_B10 article-title: Rapid automatic detection and alignment of repeats in protein sequences publication-title: Proteins doi: 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z – volume: 31 start-page: 3672 year: 2003 ident: 2023013112122356900_B16 article-title: mreps: efficient and flexible detection of tandem repeats in DNA publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkg617 – volume: 73 start-page: 125 year: 2006 ident: 2023013112122356900_B2 article-title: Structure, function, and amyloidogenesis of fungal prions: filament polymorphism and prion variants publication-title: Adv. Protein Chem. doi: 10.1016/S0065-3233(06)73005-4 – volume: 73 start-page: 235 year: 2006 ident: 2023013112122356900_B21 article-title: Structural models of amyloid-like fibrils publication-title: Adv. Protein Chem. doi: 10.1016/S0065-3233(06)73008-X – volume: 155 start-page: 306 year: 2006 ident: 2023013112122356900_B13 article-title: The turn of the screw: variations of the abundant beta-solenoid motif in passenger domains of Type V secretory proteins publication-title: J. Struct. Biol. doi: 10.1016/j.jsb.2006.01.015 – volume: 73 start-page: 1 year: 2006 ident: 2023013112122356900_B12 article-title: Beta-structures in fibrous proteins publication-title: Adv. Protein Chem. doi: 10.1016/S0065-3233(06)73001-7 – volume: 3 start-page: 867 year: 1995 ident: 2023013112122356900_B14 article-title: Modeling of the three-dimensional structure of proteins with the typical leucine-rich repeats publication-title: Structure doi: 10.1016/S0969-2126(01)00222-2
SSID	ssj0051444 ssj0005056
Score	2.379942
Snippet	Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological... Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions... MOTIVATION: Over the last years a number of evidences has been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological...
SourceID	unpaywall hal proquest pubmed pascalfrancis crossref oup istex
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	2632
SubjectTerms	Algorithms Amino Acid Sequence Base Sequence Biochemistry, Molecular Biology Biological and medical sciences Computational Biology - methods Databases, Genetic Databases, Protein Fundamental and applied biological sciences. Psychology General aspects Life Sciences Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Proteins - chemistry Repetitive Sequences, Amino Acid Sequence Analysis, Protein - methods Tandem Repeat Sequences
Title	T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm
URI	https://api.istex.fr/ark:/67375/HXZ-8S4L3SR2-4/fulltext.pdf https://www.ncbi.nlm.nih.gov/pubmed/19671691 https://www.proquest.com/docview/734081430 https://hal.science/hal-00423755 https://academic.oup.com/bioinformatics/article-pdf/25/20/2632/48993465/bioinformatics_25_20_2632.pdf
UnpaywallVersion	publishedVersion
Volume	25
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: KQ8 dateStart: 19960101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: ADMLS dateStart: 19980101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1460-2059 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: DIK dateStart: 19960101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1460-2059 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: GX1 dateStart: 19960101 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: RPM dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVOVD databaseName: Journals@Ovid LWW All Open Access Journal Collection Rolling customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: OVEED dateStart: 20010101 isFulltext: true titleUrlDefault: http://ovidsp.ovid.com/ providerName: Ovid – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 20220930 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Zj9MwEB5tWyFAiPsIR2UhhMRD2sZHDt4q1FXFLgtqU6nsi-UkNlTbplGbAssP4Hdj54KtkFgeeIsST2yNv8Qz9sw3AC-Ir3xlGAAi5Q_MbpWwo5i4tvAUlUwoogqSpHcn7nhG387Z_ABknQsjqqjwXp3SEC3WFYWooS3uV_q0s0T1MdOue98QjvepdhsIddlec44ZxwNumvS0RAs6LtMmexs6s5MPw49lTpanh1hUUK6uHafO9AnIfvdRnlEfX1jDWp9NBGXHTMq3OkvuRia2WsWqrI3xJ-P1OlzdpZk4_yqWy98WtMNb8KNWRRnHctbb5VEv_r7HEvnfdXUbblYmMRqW77kDBzK9C1fKIpnn9yAM7cnoaPoaLZIqsqkAE1orFJod8BWajDK9omzRIkVNZDgyG8xIoCN7JUU6RWa1TpBYflpv9IPVfZgdjsI3Y7uqBWHHNPBz7eb6jnSpG_tEuiqIqBoIhinBMnAFIYoqRzvzOPECSWTsqShQzMOxgyMqsDa5yANop-tUPgKUSCNJmfIGCfV8EXiCJMyUkXc1DoS0gNazy-OKKN3U61jy8sCe8D3FlaCwoNeIZSVTyN8EnmvoNG0Nz_d4eMzNvTJaibEvjgUvC2Q1zcTmzMTieYyP56fcn9JjMp1gTi14pWFw2Z67FwDaSGGzx8h8YgGqEcv178acIYlUrndb7hEzE5QMLHhYIvlXl4FbUC9Z0G-gfbnxPP5niSdwrTjWM6FF7Cm0881OPtPWYR51oRW-n3erL_wnJaVn3w
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Zj9MwEB7ttkKAEPdCOFYWQkg8JG185OCtQl1V7LJCPaTCi-UkNlttm0RtCiw_gN-NnQu2QmJ54C1KPLE1_hLP2DPfALwkgQqUYQCIVNA3u1XCjmLi2cJXVDKhiCpJkt6feqMZfTdn8z2QTS6MqKPCnSalIVpkNYWooS3u1fq080T1MNOue88QjveodhsI9dhOc44Zx31umjhaYh-6HtMmewe6s9MPg49VTpavh1hWUK6vXbfJ9AnJbvdRkdMAX1rD9s9MBGXXTMq3JkvuVi42WsWqqo3xJ-P1Jlzfprm4-CqWy98WtKM78KNRRRXHcu5si8iJv--wRP53Xd2F27VJjAbVe-7Bnkzvw7WqSObFA5hO7fHwePIGLZI6sqkEE8oUmpod8BUaD3O9omzQIkVtZDgyG8xIoGN7JUU6QWa1TpBYfs7W-sHqIcyOhtO3I7uuBWHHNAwK7eYGrvSoFwdEeiqMqOoLhinBMvQEIYoqVzvzOPFDSWTsqyhUzMexiyMqsDa5yAF00iyVjwEl0khSpvx-Qv1AhL4gCTNl5D2NAyEtoM3s8rgmSjf1Opa8OrAnfEdxFSgscFqxvGIK-ZvACw2dtq3h-R4NTri5V0UrMfbFteBViay2mVifm1g8n_HR_BMPJvSETMaYUwteaxhctefDSwBtpbDZY2QBsQA1iOX6d2POkEQqs-2G-8TMBCV9Cx5VSP7VZeiV1EsW9FpoX208T_5Z4incKI_1TGgRewadYr2Vz7V1WESH9bf9E7t0ZsM
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=T-REKS%3A+identification+of+Tandem+REpeats+in+sequences+with+a+K-meanS+based+algorithm&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Jorda%2C+Julien&rft.au=Kajava%2C+Andrey+V&rft.date=2009-10-15&rft.pub=Oxford+University+Press+%28OUP%29&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtp482&rft_id=info%3Apmid%2F19671691&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-00423755v1
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon