T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm

Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evol...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 25; no. 20; pp. 2632 - 2638
Main Authors Jorda, Julien, Kajava, Andrey V.
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.10.2009
Oxford University Press (OUP)
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1460-2059
1367-4811
DOI10.1093/bioinformatics/btp482

Cover

Abstract Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online.
AbstractList Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact:  julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information:  Supplementary data are available at Bioinformatics online.
MOTIVATION: Over the last years a number of evidences has been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. RESULTS: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. AVAILABILITY: The algorithm has been implemented in JAVA, the program is available upon request at bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at bioinfo.montp.cnrs.fr/?r=repeatDB. CONTACT: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr.
Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins.MOTIVATIONOver the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins.We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.RESULTSWe developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.AVAILABILITYThe algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.
Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.
Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. Availability: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB. Contact: julien.jorda@crbm.cnrs.fr; andrey.kajava@crbm.cnrs.fr Supplementary information: Supplementary data are available at Bioinformatics online.
Author Kajava, Andrey V.
Jorda, Julien
Author_xml – sequence: 1
  givenname: Julien
  surname: Jorda
  fullname: Jorda, Julien
  organization: To whom correspondence should be addressed
– sequence: 2
  givenname: Andrey V.
  surname: Kajava
  fullname: Kajava, Andrey V.
  organization: To whom correspondence should be addressed
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=22005583$$DView record in Pascal Francis
https://www.ncbi.nlm.nih.gov/pubmed/19671691$$D View this record in MEDLINE/PubMed
https://hal.science/hal-00423755$$DView record in HAL
BookMark eNqNkUtv1DAURiNURB_wE0DeIMQi1K84CV1V1ZRBHQlpZpAQG-vGsakhsYOd0Pbf4yGjVrChK1v2Ode-3z3ODpx3OsteEvyO4JqdNtZbZ3zoYbQqnjbjwCv6JDsiXOCc4qI-SHsmypxXmB1mxzF-x7ggnPNn2SGpRUlETY6y7TZfL64275FttRutsSrV8w55g7bgWt2j9WLQMEZkHYr656Sd0hHd2PEaAbrKew1ugxqIukXQffMhXfTPs6cGuqhf7NeT7PPlYnuxzFefPny8OF_litfVmHNcES24UBXTwtQNNxgKyhnVtQDGDDekrCralrVmWpWmqU1RUkVow4HWpWAnmZjrTm6AuxvoOjkE20O4kwTLXUzy75jkHFMS387iNTwoHqxcnq_k7gxjTllZFL9IYt_M7BB8aj-OsrdR6a4Dp_0UZcl2fXCGE_lqT05Nr9uHv-zTTsDrPQBRQWcCOGXjPUdpmlBRscQVM6eCjzFo8-i2zv7xlB3_zHMMYLv_2ni2_TQ8-sF8Vmwc9e29BOGHFGXKTy6_fJXVhq_YZk0lZ78BZ0_bZw
CitedBy_id crossref_primary_10_2174_2212796813666190207143223
crossref_primary_10_1371_journal_pone_0141729
crossref_primary_10_1093_dnares_dsad007
crossref_primary_10_4018_IJKDB_2017070101
crossref_primary_10_1002_bies_201100034
crossref_primary_10_1080_00275514_2020_1716566
crossref_primary_10_1093_gbe_evz059
crossref_primary_10_1016_j_jmb_2021_166895
crossref_primary_10_1093_mp_sst006
crossref_primary_10_1039_C7MB00412E
crossref_primary_10_1002_pro_2907
crossref_primary_10_3389_fpls_2017_00124
crossref_primary_10_3390_genes11040407
crossref_primary_10_1016_j_heliyon_2022_e10516
crossref_primary_10_3390_pathogens10111520
crossref_primary_10_1007_s12551_023_01130_0
crossref_primary_10_3390_e24050632
crossref_primary_10_3390_ijms22137096
crossref_primary_10_3389_fmicb_2020_560667
crossref_primary_10_1186_s12864_016_2422_y
crossref_primary_10_1093_bib_bbac299
crossref_primary_10_1093_bioinformatics_btaa121
crossref_primary_10_1007_s12038_020_00058_x
crossref_primary_10_3390_biom12111610
crossref_primary_10_1093_nar_gky890
crossref_primary_10_1111_j_1742_4658_2010_07684_x
crossref_primary_10_1016_j_jsb_2017_10_001
crossref_primary_10_1021_acs_jctc_1c00129
crossref_primary_10_1016_j_celrep_2022_111347
crossref_primary_10_1128_AEM_01592_13
crossref_primary_10_1016_j_gene_2010_02_007
crossref_primary_10_1016_j_jda_2012_12_007
crossref_primary_10_1093_bioadv_vbae149
crossref_primary_10_1186_s12859_014_0440_9
crossref_primary_10_1186_1471_2164_14_270
crossref_primary_10_1038_nsmb_3441
crossref_primary_10_1534_g3_116_031880
crossref_primary_10_1073_pnas_2101349118
crossref_primary_10_1186_s13071_017_2185_6
crossref_primary_10_1134_S0026893321040038
crossref_primary_10_1080_21505594_2015_1112491
crossref_primary_10_3389_fpls_2015_01218
crossref_primary_10_1155_2017_7949287
crossref_primary_10_1093_nar_gkt628
crossref_primary_10_1007_s12551_023_01140_y
crossref_primary_10_1186_1471_2229_12_207
crossref_primary_10_1007_s00726_016_2187_2
crossref_primary_10_1093_gbe_evu251
crossref_primary_10_1186_1475_2875_13_495
crossref_primary_10_1007_s00253_019_09639_9
crossref_primary_10_1016_j_jsb_2014_03_013
crossref_primary_10_1016_j_compbiolchem_2018_12_015
crossref_primary_10_1042_BST20150073
crossref_primary_10_1371_journal_pone_0260830
crossref_primary_10_1039_C5MB00273G
crossref_primary_10_1111_nph_13184
crossref_primary_10_3390_plants13182619
crossref_primary_10_1093_femsec_fiv120
crossref_primary_10_1073_pnas_2020885118
crossref_primary_10_1142_S0219720014420098
crossref_primary_10_1155_2018_4028417
crossref_primary_10_1111_jeu_12820
crossref_primary_10_1007_s00572_021_01066_x
crossref_primary_10_1074_jbc_M113_471805
crossref_primary_10_1016_j_micres_2023_127437
crossref_primary_10_1186_1471_2105_13_S3_S8
crossref_primary_10_1134_S160767291706014X
crossref_primary_10_1128_jb_00107_22
crossref_primary_10_3389_fbioe_2019_00250
crossref_primary_10_1093_bib_bbs023
crossref_primary_10_3390_microorganisms11092256
crossref_primary_10_1515_sagmb_2015_0079
crossref_primary_10_4137_EBO_S9248
crossref_primary_10_3389_fgene_2024_1474611
crossref_primary_10_1016_j_molliq_2020_113618
crossref_primary_10_3389_fbioe_2015_00143
crossref_primary_10_3389_fnmol_2019_00274
crossref_primary_10_1371_journal_pgen_1004078
crossref_primary_10_1186_s12864_019_5536_1
crossref_primary_10_26508_lsa_202201677
crossref_primary_10_1038_s41396_018_0223_9
crossref_primary_10_1088_1742_6596_937_1_012013
crossref_primary_10_3390_ijms26052004
crossref_primary_10_3390_biom13071116
crossref_primary_10_1111_1469_0691_12134
crossref_primary_10_1371_journal_pone_0020488
crossref_primary_10_1093_bioinformatics_btu437
crossref_primary_10_1111_j_1742_464X_2010_07684_x
crossref_primary_10_1515_jib_2020_0024
crossref_primary_10_1186_s12864_021_07586_2
crossref_primary_10_1016_j_jsb_2019_08_003
crossref_primary_10_2139_ssrn_3985261
crossref_primary_10_3390_genes12040473
crossref_primary_10_3389_fbinf_2021_685844
crossref_primary_10_1093_nar_gks726
crossref_primary_10_1002_pmic_201100534
crossref_primary_10_1074_jbc_M116_761213
crossref_primary_10_1186_s12936_016_1563_4
crossref_primary_10_1016_j_compbiolchem_2013_09_001
crossref_primary_10_7868_S0869565217360245
crossref_primary_10_1038_s42003_023_05322_y
crossref_primary_10_1134_S0006297911090082
crossref_primary_10_1371_journal_pone_0209463
crossref_primary_10_1186_s43008_023_00115_8
crossref_primary_10_1002_prot_26726
crossref_primary_10_1016_j_febslet_2015_08_025
crossref_primary_10_1111_nph_14089
crossref_primary_10_1371_journal_pone_0029847
crossref_primary_10_1186_s12859_020_3493_y
crossref_primary_10_1038_s41598_020_58342_7
crossref_primary_10_1093_nargab_lqae154
crossref_primary_10_1371_journal_pone_0215912
crossref_primary_10_1016_j_jsb_2011_08_009
crossref_primary_10_1111_1751_7915_12731
crossref_primary_10_1093_molbev_msu062
crossref_primary_10_1016_j_jsb_2023_108002
crossref_primary_10_1093_bioinformatics_btv306
crossref_primary_10_1093_bioinformatics_btt647
crossref_primary_10_1111_jeu_12412
crossref_primary_10_1007_s11517_015_1304_9
Cites_doi 10.1006/jmbi.2000.3684
10.1093/nar/gkh340
10.1006/jmbi.1999.3136
10.1093/bioinformatics/bth335
10.1093/nar/22.22.4673
10.1093/nar/27.2.573
10.1002/j.1538-7305.1950.tb00463.x
10.1093/bioinformatics/bth911
10.1016/S0968-0004(00)01643-1
10.1016/S0968-0004(97)01058-X
10.1016/S0168-9525(00)02024-2
10.1073/pnas.84.13.4355
10.1093/bioinformatics/btl309
10.1110/ps.9.6.1203
10.1007/BF02289588
10.1089/106652701300099038
10.1093/nar/gkg563
10.1186/1471-2105-8-382
10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z
10.1093/nar/gkg617
10.1016/S0065-3233(06)73005-4
10.1016/S0065-3233(06)73008-X
10.1016/j.jsb.2006.01.015
10.1016/S0065-3233(06)73001-7
10.1016/S0969-2126(01)00222-2
ContentType Journal Article
Copyright The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2009
2009 INIST-CNRS
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2009
– notice: 2009 INIST-CNRS
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID BSCLL
AAYXX
CITATION
IQODW
CGR
CUY
CVF
ECM
EIF
NPM
7X8
1XC
ADTOC
UNPAY
DOI 10.1093/bioinformatics/btp482
DatabaseName Istex
CrossRef
Pascal-Francis
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
Hyper Article en Ligne (HAL)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
CrossRef

MEDLINE - Academic
MEDLINE

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1460-2059
1367-4811
EndPage 2638
ExternalDocumentID 10.1093/bioinformatics/btp482
oai:HAL:hal-00423755v1
19671691
22005583
10_1093_bioinformatics_btp482
ark_67375_HXZ_8S4L3SR2_4
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID -~X
.2P
.I3
482
48X
5GY
AAMVS
ABJNI
ABPTD
ACGFS
ACUFI
ADZXQ
ALMA_UNASSIGNED_HOLDINGS
BSCLL
CZ4
EE~
F5P
F9B
H5~
HAR
HW0
IOX
KSI
KSN
NGC
Q5Y
RD5
ROZ
RXO
TLC
TN5
TOX
WH7
~91
ADRIX
BCRHZ
KOP
ROX
---
-E4
.DC
0R~
1TH
23N
2WC
4.4
53G
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABQLI
ABWST
ABXVV
ABZBJ
ACIWK
ACPRK
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQPQ
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
CITATION
COF
CS3
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EJD
EMOBN
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
HVGLF
HZ~
J21
JXSIZ
KAQDR
KQ8
M-Z
MK~
ML0
N9A
NLBLG
NMDNZ
NOMLY
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
R44
RNS
ROL
RPM
RUSNO
RW1
SV3
TEORI
TJP
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~KM
.-4
.GJ
ABEFU
AI.
AQDSO
ATTQO
ELUNK
IQODW
NTWIH
O~Y
RIG
RNI
RZF
RZO
VH1
ZGI
ABQTQ
CGR
CUY
CVF
ECM
EIF
M49
NPM
7X8
1XC
ADTOC
UNPAY
ID FETCH-LOGICAL-c498t-4081e646c83e6f9b4f0a52432e96a33f4f17882d79e3ec7fb9f572c12b4a29763
IEDL.DBID UNPAY
ISSN 1367-4803
1367-4811
IngestDate Wed Oct 01 16:13:39 EDT 2025
Tue Oct 14 20:00:02 EDT 2025
Fri Sep 05 10:08:02 EDT 2025
Thu Apr 03 07:05:25 EDT 2025
Mon Jul 21 09:15:35 EDT 2025
Wed Oct 01 04:04:50 EDT 2025
Thu Apr 24 23:03:14 EDT 2025
Wed Aug 28 03:24:18 EDT 2024
Sat Sep 20 11:01:58 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 20
Keywords K means algorithm
Identification
Tandemly repeated sequence
Repeated sequence
Language English
License CC BY 4.0
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c498t-4081e646c83e6f9b4f0a52432e96a33f4f17882d79e3ec7fb9f572c12b4a29763
Notes ArticleID:btp482
To whom correspondence should be addressed.
Associate Editor: Ivo Hofacker
istex:DDA26AF95B10B3F3C4DD135C9B4729948195B4D6
ark:/67375/HXZ-8S4L3SR2-4
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-2342-6886
OpenAccessLink https://proxy.k.utb.cz/login?url=https://academic.oup.com/bioinformatics/article-pdf/25/20/2632/48993465/bioinformatics_25_20_2632.pdf
PMID 19671691
PQID 734081430
PQPubID 23479
PageCount 7
ParticipantIDs unpaywall_primary_10_1093_bioinformatics_btp482
hal_primary_oai_HAL_hal_00423755v1
proquest_miscellaneous_734081430
pubmed_primary_19671691
pascalfrancis_primary_22005583
crossref_primary_10_1093_bioinformatics_btp482
crossref_citationtrail_10_1093_bioinformatics_btp482
oup_primary_10_1093_bioinformatics_btp482
istex_primary_ark_67375_HXZ_8S4L3SR2_4
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2009-10-15
PublicationDateYYYYMMDD 2009-10-15
PublicationDate_xml – month: 10
  year: 2009
  text: 2009-10-15
  day: 15
PublicationDecade 2000
PublicationPlace Oxford
PublicationPlace_xml – name: Oxford
– name: England
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2009
Publisher Oxford University Press
Oxford University Press (OUP)
Publisher_xml – name: Oxford University Press
– name: Oxford University Press (OUP)
References Kajava (2023013112122356900_B12) 2006; 73
Landau (2023013112122356900_B17) 2001; 8
Kolpakov (2023013112122356900_B16) 2003; 31
Heger (2023013112122356900_B10) 2000; 41
Baxa (2023013112122356900_B2) 2006; 73
Johnson (2023013112122356900_B11) 1967; 32
Kajava (2023013112122356900_B13) 2006; 155
Katti (2023013112122356900_B15) 2000; 9
Kajava (2023013112122356900_B14) 1995; 3
Marcotte (2023013112122356900_B20) 1999; 293
Andrade (2023013112122356900_B1) 2000; 298
MacQueen (2023013112122356900_B19) 1967
Sokol (2023013112122356900_B24) 2007; 23
Nelson (2023013112122356900_B21) 2006; 73
George (2023013112122356900_B7) 2000; 25
Lupas (2023013112122356900_B18) 1997; 22
Newman (2023013112122356900_B22) 2007; 8
Rice (2023013112122356900_B23) 2000; 16
Szklarczyk (2023013112122356900_B25) 2004; 20
Benson (2023013112122356900_B3) 1999; 27
Thompson (2023013112122356900_B26) 1994; 22
Delgrange (2023013112122356900_B4) 2004; 20
Gasteiger (2023013112122356900_B6) 2003; 31
Gribskov (2023013112122356900_B8) 1987; 84
Edgar (2023013112122356900_B5) 2004; 32
Hamming (2023013112122356900_B9) 1950; 29
References_xml – volume: 298
  start-page: 521
  year: 2000
  ident: 2023013112122356900_B1
  article-title: Homology-based method for identification of protein repeats using statistical significance estimates
  publication-title: J. Mol. Biol.
  doi: 10.1006/jmbi.2000.3684
– volume: 32
  start-page: 1792
  year: 2004
  ident: 2023013112122356900_B5
  article-title: MUSCLE: multiple sequence alignment with high accuracy and high throughput
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkh340
– volume: 293
  start-page: 151
  year: 1999
  ident: 2023013112122356900_B20
  article-title: A census of protein repeats
  publication-title: J. Mol. Biol.
  doi: 10.1006/jmbi.1999.3136
– volume: 20
  start-page: 2812
  year: 2004
  ident: 2023013112122356900_B4
  article-title: STAR: an algorithm to search for Tandem Approximate Repeats
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth335
– volume: 22
  start-page: 4673
  year: 1994
  ident: 2023013112122356900_B26
  article-title: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/22.22.4673
– volume: 27
  start-page: 573
  year: 1999
  ident: 2023013112122356900_B3
  article-title: Tandem repeats finder: a program to analyze DNA sequences
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/27.2.573
– volume: 29
  start-page: 147
  year: 1950
  ident: 2023013112122356900_B9
  article-title: Error detecting and error correcting codes
  publication-title: Bell System Technical J.
  doi: 10.1002/j.1538-7305.1950.tb00463.x
– volume: 20
  start-page: i311
  issue: Suppl. 1
  year: 2004
  ident: 2023013112122356900_B25
  article-title: Tracking repeats using significance and transitivity
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth911
– volume: 25
  start-page: 515
  year: 2000
  ident: 2023013112122356900_B7
  article-title: The REPRO server: finding protein internal sequence repeats through the Web
  publication-title: Trends Biochem Sci.
  doi: 10.1016/S0968-0004(00)01643-1
– volume: 22
  start-page: 195
  year: 1997
  ident: 2023013112122356900_B18
  article-title: A repetitive sequence in subunits of the 26S proteasome and 20S cyclosome (anaphase-promoting complex)
  publication-title: Trends Biochem Sci.
  doi: 10.1016/S0968-0004(97)01058-X
– volume: 16
  start-page: 276
  year: 2000
  ident: 2023013112122356900_B23
  article-title: EMBOSS: the European Molecular Biology Open Software Suite
  publication-title: Trends Genet.
  doi: 10.1016/S0168-9525(00)02024-2
– volume: 84
  start-page: 4355
  year: 1987
  ident: 2023013112122356900_B8
  article-title: Profile analysis: detection of distantly related proteins
  publication-title: Proc. Natl Acad. Sci. USA
  doi: 10.1073/pnas.84.13.4355
– volume-title: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.
  year: 1967
  ident: 2023013112122356900_B19
  article-title: Some methods for classification and analysis of multivariate observations
– volume: 23
  start-page: e30
  year: 2007
  ident: 2023013112122356900_B24
  article-title: Tandem repeats over the edit distance
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl309
– volume: 9
  start-page: 1203
  year: 2000
  ident: 2023013112122356900_B15
  article-title: Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications
  publication-title: Protein Sci.
  doi: 10.1110/ps.9.6.1203
– volume: 32
  start-page: 241
  year: 1967
  ident: 2023013112122356900_B11
  article-title: Hierarchical clustering schemes
  publication-title: Psychometrika
  doi: 10.1007/BF02289588
– volume: 8
  start-page: 1
  year: 2001
  ident: 2023013112122356900_B17
  article-title: An algorithm for approximate tandem repeats
  publication-title: J. Comput. Biol.
  doi: 10.1089/106652701300099038
– volume: 31
  start-page: 3784
  year: 2003
  ident: 2023013112122356900_B6
  article-title: ExPASy: The proteomics server for in-depth protein knowledge and analysis
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkg563
– volume: 8
  start-page: 382
  year: 2007
  ident: 2023013112122356900_B22
  article-title: XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-8-382
– volume: 41
  start-page: 224
  year: 2000
  ident: 2023013112122356900_B10
  article-title: Rapid automatic detection and alignment of repeats in protein sequences
  publication-title: Proteins
  doi: 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z
– volume: 31
  start-page: 3672
  year: 2003
  ident: 2023013112122356900_B16
  article-title: mreps: efficient and flexible detection of tandem repeats in DNA
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkg617
– volume: 73
  start-page: 125
  year: 2006
  ident: 2023013112122356900_B2
  article-title: Structure, function, and amyloidogenesis of fungal prions: filament polymorphism and prion variants
  publication-title: Adv. Protein Chem.
  doi: 10.1016/S0065-3233(06)73005-4
– volume: 73
  start-page: 235
  year: 2006
  ident: 2023013112122356900_B21
  article-title: Structural models of amyloid-like fibrils
  publication-title: Adv. Protein Chem.
  doi: 10.1016/S0065-3233(06)73008-X
– volume: 155
  start-page: 306
  year: 2006
  ident: 2023013112122356900_B13
  article-title: The turn of the screw: variations of the abundant beta-solenoid motif in passenger domains of Type V secretory proteins
  publication-title: J. Struct. Biol.
  doi: 10.1016/j.jsb.2006.01.015
– volume: 73
  start-page: 1
  year: 2006
  ident: 2023013112122356900_B12
  article-title: Beta-structures in fibrous proteins
  publication-title: Adv. Protein Chem.
  doi: 10.1016/S0065-3233(06)73001-7
– volume: 3
  start-page: 867
  year: 1995
  ident: 2023013112122356900_B14
  article-title: Modeling of the three-dimensional structure of proteins with the typical leucine-rich repeats
  publication-title: Structure
  doi: 10.1016/S0969-2126(01)00222-2
SSID ssj0051444
ssj0005056
Score 2.379942
Snippet Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological...
Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions...
MOTIVATION: Over the last years a number of evidences has been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological...
SourceID unpaywall
hal
proquest
pubmed
pascalfrancis
crossref
oup
istex
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 2632
SubjectTerms Algorithms
Amino Acid Sequence
Base Sequence
Biochemistry, Molecular Biology
Biological and medical sciences
Computational Biology - methods
Databases, Genetic
Databases, Protein
Fundamental and applied biological sciences. Psychology
General aspects
Life Sciences
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Molecular Sequence Data
Proteins - chemistry
Repetitive Sequences, Amino Acid
Sequence Analysis, Protein - methods
Tandem Repeat Sequences
Title T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm
URI https://api.istex.fr/ark:/67375/HXZ-8S4L3SR2-4/fulltext.pdf
https://www.ncbi.nlm.nih.gov/pubmed/19671691
https://www.proquest.com/docview/734081430
https://hal.science/hal-00423755
https://academic.oup.com/bioinformatics/article-pdf/25/20/2632/48993465/bioinformatics_25_20_2632.pdf
UnpaywallVersion publishedVersion
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: KQ8
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: ADMLS
  dateStart: 19980101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: DIK
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: GX1
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: RPM
  dateStart: 20070101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVOVD
  databaseName: Journals@Ovid LWW All Open Access Journal Collection Rolling
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: OVEED
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: http://ovidsp.ovid.com/
  providerName: Ovid
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20220930
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Zj9MwEB5tWyFAiPsIR2UhhMRD2sZHDt4q1FXFLgtqU6nsi-UkNlTbplGbAssP4Hdj54KtkFgeeIsST2yNv8Qz9sw3AC-Ir3xlGAAi5Q_MbpWwo5i4tvAUlUwoogqSpHcn7nhG387Z_ABknQsjqqjwXp3SEC3WFYWooS3uV_q0s0T1MdOue98QjvepdhsIddlec44ZxwNumvS0RAs6LtMmexs6s5MPw49lTpanh1hUUK6uHafO9AnIfvdRnlEfX1jDWp9NBGXHTMq3OkvuRia2WsWqrI3xJ-P1OlzdpZk4_yqWy98WtMNb8KNWRRnHctbb5VEv_r7HEvnfdXUbblYmMRqW77kDBzK9C1fKIpnn9yAM7cnoaPoaLZIqsqkAE1orFJod8BWajDK9omzRIkVNZDgyG8xIoCN7JUU6RWa1TpBYflpv9IPVfZgdjsI3Y7uqBWHHNPBz7eb6jnSpG_tEuiqIqBoIhinBMnAFIYoqRzvzOPECSWTsqShQzMOxgyMqsDa5yANop-tUPgKUSCNJmfIGCfV8EXiCJMyUkXc1DoS0gNazy-OKKN3U61jy8sCe8D3FlaCwoNeIZSVTyN8EnmvoNG0Nz_d4eMzNvTJaibEvjgUvC2Q1zcTmzMTieYyP56fcn9JjMp1gTi14pWFw2Z67FwDaSGGzx8h8YgGqEcv178acIYlUrndb7hEzE5QMLHhYIvlXl4FbUC9Z0G-gfbnxPP5niSdwrTjWM6FF7Cm0881OPtPWYR51oRW-n3erL_wnJaVn3w
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Zj9MwEB7ttkKAEPdCOFYWQkg8JG185OCtQl1V7LJCPaTCi-UkNlttm0RtCiw_gN-NnQu2QmJ54C1KPLE1_hLP2DPfALwkgQqUYQCIVNA3u1XCjmLi2cJXVDKhiCpJkt6feqMZfTdn8z2QTS6MqKPCnSalIVpkNYWooS3u1fq080T1MNOue88QjveodhsI9dhOc44Zx31umjhaYh-6HtMmewe6s9MPg49VTpavh1hWUK6vXbfJ9AnJbvdRkdMAX1rD9s9MBGXXTMq3JkvuVi42WsWqqo3xJ-P1Jlzfprm4-CqWy98WtKM78KNRRRXHcu5si8iJv--wRP53Xd2F27VJjAbVe-7Bnkzvw7WqSObFA5hO7fHwePIGLZI6sqkEE8oUmpod8BUaD3O9omzQIkVtZDgyG8xIoGN7JUU6QWa1TpBYfs7W-sHqIcyOhtO3I7uuBWHHNAwK7eYGrvSoFwdEeiqMqOoLhinBMvQEIYoqVzvzOPFDSWTsqyhUzMexiyMqsDa5yAF00iyVjwEl0khSpvx-Qv1AhL4gCTNl5D2NAyEtoM3s8rgmSjf1Opa8OrAnfEdxFSgscFqxvGIK-ZvACw2dtq3h-R4NTri5V0UrMfbFteBViay2mVifm1g8n_HR_BMPJvSETMaYUwteaxhctefDSwBtpbDZY2QBsQA1iOX6d2POkEQqs-2G-8TMBCV9Cx5VSP7VZeiV1EsW9FpoX208T_5Z4incKI_1TGgRewadYr2Vz7V1WESH9bf9E7t0ZsM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=T-REKS%3A+identification+of+Tandem+REpeats+in+sequences+with+a+K-meanS+based+algorithm&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Jorda%2C+Julien&rft.au=Kajava%2C+Andrey+V&rft.date=2009-10-15&rft.pub=Oxford+University+Press+%28OUP%29&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtp482&rft_id=info%3Apmid%2F19671691&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-00423755v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon