Indexing compressed text

We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1,...

Full description

Saved in:
Bibliographic Details
Published inJournal of the ACM Vol. 52; no. 4; pp. 552 - 581
Main Authors Ferragina, Paolo, Manzini, Giovanni
Format Journal Article
LanguageEnglish
Published New York, NY Association for Computing Machinery 01.07.2005
Subjects
Online AccessGet full text
ISSN0004-5411
1557-735X
DOI10.1145/1082036.1082039

Cover

Abstract We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array .Our second compressed data structure achieves O ( p + occ ) query time using O ( n H k ( T )log ε n ) + o ( n ) bits of storage for any chosen ε, 0<ε<1. Therefore, it provides optimal output-sensitive query time using o ( n log n ) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm.
AbstractList We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P[1,p] within a text T[1,n] in O(p occ log super(1 epsilon ) n) time for any chosen epsilon , 0< epsilon <1. This data structure uses at most 5nHk(T) o(n) bits of storage, where Hk(T) is the kth order empirical entropy of T. The space usage is Theta (n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array.Our second compressed data structure achieves O(pocc) query time using O(nHk(T)log super( epsilon ) n) o(n) bits of storage for any chosen epsilon , 0< epsilon <1. Therefore, it provides optimal output-sensitive query time using o(nlog n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm.
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array .Our second compressed data structure achieves O ( p + occ ) query time using O ( n H k ( T )log ε n ) + o ( n ) bits of storage for any chosen ε, 0<ε<1. Therefore, it provides optimal output-sensitive query time using o ( n log n ) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm.
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form. The first compressed data structure retrieves the occ occurrences of a pattern P [1,p ] within a text T [1,n ] in O(p + occ log ^super l + [epsilon]^ n ) time for any chosen [epsilon], 0 [is less than][epsilon][is less than] 1. This data structure uses at most 5nH^sub k^(T) + o(n) bits of storage, where H^sub k^(T) is the kth order empirical entropy of T. The space usage is [theta](n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows-Wheeler Transform, and can be regarded as a compressed suffix array. The second compressed data structure achieves O(p +occ) query time using O (nH^subk^ (T)log ^super [epsilon]^ n) + o(n) bits of storage for any chosen [epsilon], 0 [is less than][epsilon][is less than] 1. Therefore, it provides optimal output-sensitive query time using o(n log n ) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm. [PUBLICATION ABSTRACT]
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form. Our first compressed data structure retrieves the occ occurrences of a pattern P[1, p] within a text T[1, n] in O(p occ log"' n) time for any chosen E, O < E < 1. This data structure uses at most 5nHk(T)- o(n) bits of storage, where 1-1k(T) is the kth order empirical entropy of T. The space usage is e(n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the BurrowsWheeler Transform, and can be regarded as a compressed suffix array. Our second compressed data structure achieves O(p occ) query time using 0 (n Hk(T) log' n) + O(n) bits of storage for any chosen E, O < c < 1. Therefore, it provides optimal output-sensitive query time using o(n log n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the BurrowsWheeler Transform and the LZ78 algorithm.
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P[1,p] within a text T[1,n] in O(p occ log(1 epsilon) n) time for any chosen epsilon, 0 < epsilon < 1. This data structure uses at most 5nH(k)(T) o(n) bits of storage, where H(k)(T) is the kth order empirical entropy of T. The space usage is Theta(n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array.Our second compressed data structure achieves O(p occ) query time using O(nH(k)(T)log(epsilon) n) o(n) bits of storage for any chosen epsilon, 0 < epsilon < 1. Therefore, it provides optimal output-sensitive query time using o(nlog n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm.
Author Manzini, Giovanni
Ferragina, Paolo
Author_xml – sequence: 1
  givenname: Paolo
  surname: Ferragina
  fullname: Ferragina, Paolo
  organization: Università di Pisa, Pisa, Italy
– sequence: 2
  givenname: Giovanni
  surname: Manzini
  fullname: Manzini, Giovanni
  organization: Università del Piemonte Orientale, Alessandria, Italy
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17110824$$DView record in Pascal Francis
BookMark eNqN0c1LwzAYBvAgE9ymZz0OQfHS7U2az6MMPwYDLwreSpom0tGlM-lg_vdmrCAMFE8Pgd_zkuQdoYFvvUXoEsMUY8pmGCSBnE8PqU7QEDMmMpGz9wEaAgDNGMX4DI1iXKUjEBBDdLXwld3V_mNi2vUm2BhtNensrjtHp0430V70OUZvjw-v8-ds-fK0mN8vM0OZ6jLOBRGcllw7yzkWklHhSlWVoGhKJpTJBYBhNF0AKlMyXDlndOUqwjC3-RjdHuZuQvu5tbEr1nU0tmm0t-02FkQqBjmT_4CSyFxCgnd_QixBApdC7GdeH9FVuw0-vbfAihIiCCUJ3fRIR6MbF7Q3dSw2oV7r8FVggfdfTpObHZwJbYzBuh8CxX5FRb-iPlVqsKOGqTvd1a3vgq6bX3vfx3aRxg
CODEN JACOAH
CitedBy_id crossref_primary_10_1016_j_is_2014_06_002
crossref_primary_10_1145_3607141
crossref_primary_10_1007_s10791_008_9050_3
crossref_primary_10_1016_j_tcs_2015_10_012
crossref_primary_10_1093_comjnl_bxx108
crossref_primary_10_1007_s00778_020_00636_3
crossref_primary_10_1109_TKDE_2022_3231780
crossref_primary_10_1007_s42979_024_02986_z
crossref_primary_10_1186_s12859_019_2883_5
crossref_primary_10_1016_j_ic_2024_105153
crossref_primary_10_1145_2094072_2094073
crossref_primary_10_1016_j_ic_2024_105155
crossref_primary_10_1016_j_is_2015_08_008
crossref_primary_10_1145_2168752_2168761
crossref_primary_10_1016_j_tcs_2013_10_009
crossref_primary_10_1016_j_tcs_2017_02_020
crossref_primary_10_1002_spe_2694
crossref_primary_10_1007_s00453_017_0286_4
crossref_primary_10_1007_s00453_022_00955_7
crossref_primary_10_1109_COMST_2018_2876196
crossref_primary_10_1186_s12859_017_1574_3
crossref_primary_10_1089_cmb_2018_0230
crossref_primary_10_1016_j_jda_2012_07_009
crossref_primary_10_1093_bioinformatics_btx067
crossref_primary_10_1007_s00453_021_00855_2
crossref_primary_10_1016_j_ic_2019_01_006
crossref_primary_10_1093_bioinformatics_bts173
crossref_primary_10_1016_j_drudis_2017_01_014
crossref_primary_10_1007_s00453_017_0333_1
crossref_primary_10_1007_s00453_014_9873_9
crossref_primary_10_1142_S0219720019400110
crossref_primary_10_5626_JCSE_2009_3_1_015
crossref_primary_10_1145_1498698_1594228
crossref_primary_10_1016_j_ygeno_2024_110842
crossref_primary_10_3389_fgene_2014_00381
crossref_primary_10_1007_s12046_018_0832_z
crossref_primary_10_1109_ACCESS_2022_3221520
crossref_primary_10_1093_bioinformatics_btt706
crossref_primary_10_1186_1471_2105_12_214
crossref_primary_10_1007_s00453_019_00605_5
crossref_primary_10_1093_bioinformatics_bty500
crossref_primary_10_1145_1671970_1883684
crossref_primary_10_1016_j_parco_2014_06_007
crossref_primary_10_1016_j_jda_2016_03_002
crossref_primary_10_1145_1367064_1367072
crossref_primary_10_1093_bioinformatics_bty183
crossref_primary_10_1186_1471_2105_9_242
crossref_primary_10_1145_1290672_1290680
crossref_primary_10_1016_j_jda_2012_12_003
crossref_primary_10_1186_s13059_018_1450_0
crossref_primary_10_3390_a2031105
crossref_primary_10_1007_s00453_012_9726_3
crossref_primary_10_5626_JCSE_2009_3_1_001
crossref_primary_10_1016_j_tcs_2007_05_030
crossref_primary_10_1016_j_tcs_2023_114128
crossref_primary_10_1016_j_tcs_2019_09_030
crossref_primary_10_1145_3301312
crossref_primary_10_1007_s00453_018_0475_9
crossref_primary_10_1093_bioinformatics_btaa546
crossref_primary_10_1007_s00453_013_9767_2
crossref_primary_10_1089_cmb_2017_0089
crossref_primary_10_1089_cmb_2019_0316
crossref_primary_10_1089_cmb_2015_0172
crossref_primary_10_1186_1471_2105_14_313
crossref_primary_10_1016_j_ic_2021_104795
crossref_primary_10_1093_bib_bbad320
crossref_primary_10_1007_s00453_013_9782_3
crossref_primary_10_1016_j_is_2021_101893
crossref_primary_10_1145_2590774
crossref_primary_10_1002_pmic_201000404
crossref_primary_10_1126_science_abg8871
crossref_primary_10_1145_3588684
crossref_primary_10_3389_fgene_2020_00632
crossref_primary_10_1587_transinf_E92_D_2025
crossref_primary_10_1142_S0129054118430037
crossref_primary_10_1016_j_dam_2018_07_017
crossref_primary_10_1016_j_ic_2012_02_002
crossref_primary_10_1109_TCBB_2011_127
crossref_primary_10_1145_3457197
crossref_primary_10_1002_spe_2227
crossref_primary_10_1016_j_ipl_2010_02_010
crossref_primary_10_1145_1613676_1613680
crossref_primary_10_1109_ACCESS_2019_2949655
crossref_primary_10_1007_s00453_013_9794_z
crossref_primary_10_1016_j_csbj_2021_06_047
crossref_primary_10_1145_2594408
crossref_primary_10_1016_j_tcs_2020_11_036
crossref_primary_10_7717_peerj_14186
crossref_primary_10_1016_j_tcs_2009_03_007
crossref_primary_10_1089_cmb_2021_0445
crossref_primary_10_1016_j_ic_2021_104820
crossref_primary_10_1016_j_is_2020_101686
crossref_primary_10_1098_rsta_2013_0167
crossref_primary_10_1016_j_tcs_2020_11_041
crossref_primary_10_1007_s00453_021_00917_5
crossref_primary_10_1109_JPROC_2015_2455551
crossref_primary_10_1016_j_ic_2013_09_001
crossref_primary_10_1093_bib_bbt088
crossref_primary_10_1093_bib_bbt087
crossref_primary_10_1016_j_jda_2006_03_011
crossref_primary_10_1093_bioinformatics_btab655
crossref_primary_10_1109_TCBB_2015_2442974
crossref_primary_10_1007_s00453_017_0288_2
crossref_primary_10_1146_annurev_genom_120219_080406
crossref_primary_10_1016_j_tcs_2013_07_024
crossref_primary_10_1109_TKDE_2021_3114401
crossref_primary_10_1016_j_fsigen_2020_102257
crossref_primary_10_1016_j_jda_2015_01_004
crossref_primary_10_1016_j_tcs_2019_03_012
crossref_primary_10_1016_j_jda_2015_01_006
crossref_primary_10_1109_TKDE_2013_129
crossref_primary_10_1145_3626765
crossref_primary_10_1145_3434399
crossref_primary_10_1145_1240233_1240244
crossref_primary_10_4018_jitwe_2011070103
crossref_primary_10_1093_bioinformatics_bts414
crossref_primary_10_1109_TPDS_2021_3119402
crossref_primary_10_1145_1240233_1240243
crossref_primary_10_1186_s13015_019_0160_9
crossref_primary_10_1016_j_eswa_2016_12_033
crossref_primary_10_1007_s00453_020_00732_4
crossref_primary_10_1145_1877766_1877768
crossref_primary_10_1002_spe_1112
crossref_primary_10_1145_3644824
crossref_primary_10_1007_s40484_019_0181_x
crossref_primary_10_1128_mBio_01344_20
crossref_primary_10_1007_s00453_013_9792_1
crossref_primary_10_1109_TCBB_2015_2430314
crossref_primary_10_1007_s00453_017_0380_7
crossref_primary_10_1016_j_tcs_2010_12_036
crossref_primary_10_1145_2629339
crossref_primary_10_7717_peerj_3126
crossref_primary_10_1007_s00778_008_0094_1
crossref_primary_10_1007_s00453_013_9863_3
crossref_primary_10_1109_TCBB_2018_2831691
crossref_primary_10_1007_s11227_022_04890_w
crossref_primary_10_1016_j_jcss_2020_12_001
crossref_primary_10_1145_1198513_1198521
crossref_primary_10_1186_s12859_014_0438_3
crossref_primary_10_1007_s42514_023_00153_z
crossref_primary_10_1007_s00453_016_0165_4
crossref_primary_10_1098_rsta_2013_0135
crossref_primary_10_14778_2350229_2350265
crossref_primary_10_1093_bioinformatics_btz192
crossref_primary_10_1098_rsta_2013_0137
crossref_primary_10_1016_j_tcs_2017_08_002
crossref_primary_10_1109_ACCESS_2020_3031159
crossref_primary_10_1145_2699876
crossref_primary_10_1016_j_jcss_2011_09_002
crossref_primary_10_1007_s00778_015_0409_y
crossref_primary_10_1093_bioinformatics_btac656
crossref_primary_10_1093_bioinformatics_btw266
crossref_primary_10_1016_j_tcs_2017_12_021
crossref_primary_10_1007_s11047_022_09882_6
crossref_primary_10_1016_j_ic_2021_104749
crossref_primary_10_1145_2670128
crossref_primary_10_1371_journal_pone_0086869
crossref_primary_10_1186_s13015_019_0148_5
crossref_primary_10_1016_j_tcs_2018_09_007
crossref_primary_10_1145_3043958
crossref_primary_10_1145_2000807_2000821
crossref_primary_10_1007_s11786_016_0281_1
crossref_primary_10_1016_j_jda_2012_09_002
crossref_primary_10_1145_2000807_2000820
crossref_primary_10_1016_j_tcs_2022_12_034
crossref_primary_10_1016_j_tcs_2018_06_029
crossref_primary_10_1093_bioinformatics_btab217
crossref_primary_10_1093_bioinformatics_btw811
crossref_primary_10_14778_3665844_3665852
crossref_primary_10_1093_bioinformatics_bts690
crossref_primary_10_1145_3653314
crossref_primary_10_7717_peerj_cs_636
crossref_primary_10_3389_fgene_2020_00572
crossref_primary_10_1016_j_ic_2011_03_001
crossref_primary_10_1093_bib_bbab519
crossref_primary_10_1093_nar_gks408
crossref_primary_10_1093_comjnl_bxaa016
crossref_primary_10_1016_j_jda_2013_03_007
crossref_primary_10_1093_bib_bbx062
crossref_primary_10_1038_nrg3433
crossref_primary_10_14778_3598581_3598586
crossref_primary_10_1007_s11786_016_0283_z
crossref_primary_10_1109_TCBB_2020_2968323
crossref_primary_10_1007_s00453_019_00637_x
crossref_primary_10_1089_cmb_2009_0169
crossref_primary_10_1145_1412228_1455263
crossref_primary_10_1145_1412228_1455268
crossref_primary_10_1145_3432999
crossref_primary_10_14778_3236187_3236203
crossref_primary_10_1145_2635816
crossref_primary_10_1016_j_tcs_2011_12_002
crossref_primary_10_1016_j_tcs_2015_08_008
crossref_primary_10_1002_mp_14814
crossref_primary_10_1016_j_jda_2013_07_004
crossref_primary_10_1016_j_dam_2018_03_035
crossref_primary_10_1186_s13015_024_00260_8
crossref_primary_10_1016_j_tcs_2019_08_005
crossref_primary_10_1016_j_entcs_2014_01_021
crossref_primary_10_1186_s12859_024_05728_3
crossref_primary_10_1109_TCBB_2013_2297101
crossref_primary_10_1016_j_tcs_2007_07_013
crossref_primary_10_1137_090779759
crossref_primary_10_1093_bioinformatics_btz575
crossref_primary_10_1145_3375890
crossref_primary_10_1016_j_tcs_2007_07_017
crossref_primary_10_1016_j_tcs_2007_07_018
crossref_primary_10_1016_j_tcs_2011_05_023
crossref_primary_10_1016_j_tcs_2012_10_050
crossref_primary_10_1093_nar_gkae097
crossref_primary_10_1016_j_jda_2018_08_001
crossref_primary_10_1016_j_tcs_2007_07_020
crossref_primary_10_1109_TCBB_2021_3108843
crossref_primary_10_1145_3381417
crossref_primary_10_3724_SP_J_1001_2009_03500
crossref_primary_10_1145_1993036_1993040
crossref_primary_10_1002_spe_2377
crossref_primary_10_1089_cmb_2017_0265
crossref_primary_10_1016_j_ic_2023_105068
crossref_primary_10_1093_bioinformatics_btac226
crossref_primary_10_14778_2535569_2448951
crossref_primary_10_1007_s00778_023_00811_2
crossref_primary_10_1038_s41587_023_01662_6
crossref_primary_10_1145_1216370_1216372
crossref_primary_10_1109_TKDE_2023_3316274
crossref_primary_10_1137_070685373
crossref_primary_10_7763_IJBBB_2013_V3_183
crossref_primary_10_1145_1868237_1868248
crossref_primary_10_1093_bioinformatics_btab264
crossref_primary_10_1093_bioinformatics_btz350
crossref_primary_10_1016_j_jda_2016_10_001
crossref_primary_10_1109_MDAT_2013_2284198
crossref_primary_10_1007_s00453_014_9936_y
crossref_primary_10_1186_1471_2164_15_S5_S2
crossref_primary_10_1016_j_tcs_2019_11_002
crossref_primary_10_1016_j_tcs_2019_11_001
crossref_primary_10_1093_bib_bbw058
crossref_primary_10_1371_journal_pone_0090581
crossref_primary_10_1145_3524060
crossref_primary_10_1016_j_tcs_2012_02_002
crossref_primary_10_1093_bioinformatics_bts276
crossref_primary_10_1016_j_tcs_2012_02_006
crossref_primary_10_1007_s00453_015_0056_0
crossref_primary_10_1007_s00453_015_9990_0
crossref_primary_10_1093_bioinformatics_btz341
crossref_primary_10_1145_3462333
crossref_primary_10_1016_j_tcs_2006_12_012
crossref_primary_10_1145_3550454_3555512
crossref_primary_10_1145_3481638
crossref_primary_10_1038_nbt_3442
crossref_primary_10_1016_j_tcs_2017_06_016
Cites_doi 10.1016/S1570-8667(03)00066-2
10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O
10.5555/645898.672291
10.1137/S0097539797331105
10.1007/PL00009205
10.1101/gr.1350803
10.1007/PL00009202
10.1109/TIT.1978.1055934
10.1016/S0020-0255(01)00098-6
10.1145/301970.301973
10.1145/335305.335351
10.1016/0020-0190(96)00061-0
10.1137/0222058
10.1145/5684.5688
10.1093/comjnl/39.9.731
10.1145/1082036.1082043
10.1145/321941.321946
10.1137/S0097539700369909
10.1145/382780.382782
10.1137/S0097539795294165
10.1006/jagm.2000.1151
10.1016/S0020-0190(01)00298-8
10.1016/S0196-6774(03)00087-7
ContentType Journal Article
Copyright 2005 INIST-CNRS
Copyright Association for Computing Machinery Jul 2005
Copyright_xml – notice: 2005 INIST-CNRS
– notice: Copyright Association for Computing Machinery Jul 2005
DBID AAYXX
CITATION
IQODW
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1145/1082036.1082039
DatabaseName CrossRef
Pascal-Francis
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts
CrossRef
Computer and Information Systems Abstracts
Computer and Information Systems Abstracts
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Applied Sciences
EISSN 1557-735X
EndPage 581
ExternalDocumentID 910484081
17110824
10_1145_1082036_1082039
Genre Feature
GroupedDBID --Z
-DZ
-~X
.DC
.GJ
29L
3R3
4.4
41~
5GY
5VS
85S
8US
8VB
9M8
AAHTB
AAIKC
AAKMM
AALFJ
AAMNW
AAYFX
AAYXX
ABCQX
ABFSI
ABPEJ
ABPPZ
ACGOD
ACM
ACNCT
ADBCU
ADL
ADMLS
AEBYY
AEFXT
AEGXH
AEJOY
AEMOZ
AENEX
AENSD
AETEA
AFWIH
AFWXC
AGHSJ
AHQJS
AI.
AIKLT
AKRVB
AKVCP
ALMA_UNASSIGNED_HOLDINGS
AMVHM
ASPBG
AVWKF
BDXCO
CCLIF
CITATION
CS3
D0L
DU5
E.L
EBS
EJD
FA8
FEDTE
GUFHI
HF~
HGAVV
H~9
IAO
ICD
IEA
IOF
ITC
IVC
K1G
L7B
LHSKQ
MVM
OHT
P-O
P1C
PQQKQ
QWB
RNS
ROL
TAE
TH9
TN5
UKR
UPT
VH1
WH7
XJT
XOL
XSW
YQT
ZCA
ZCG
ZL0
ZY4
.4S
63O
AAYOK
ABGDV
ABQDU
ABTAH
ACATF
ACVLL
ADHRN
ADPZR
AFDAS
AFJFK
ARCSS
EBE
EBO
EBR
EBU
EDO
EMK
EPL
F20
I-F
IQODW
MK~
ML~
TUS
UAO
W7O
XFK
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c459t-6672764b6afe66178547fb9db094fb9579c3700c540040dcb51dffcadfd2516e3
ISSN 0004-5411
IngestDate Fri Sep 05 07:25:10 EDT 2025
Thu Sep 04 15:32:24 EDT 2025
Thu Aug 07 14:53:57 EDT 2025
Fri Jul 25 03:01:50 EDT 2025
Sun Oct 22 16:06:33 EDT 2023
Wed Oct 01 06:00:40 EDT 2025
Thu Apr 24 23:04:06 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Lempel-Ziv compressor
suffix array
Database query
Data compression
Information retrieval
full-text indexing
Design
Suffix
Algorithms
pattern searching
Theory Burrows-Wheeler transform
Information storage
Full text
suffix tree
Data structure
indexing data structure
text compression
Indexing
Information theory
Language English
License CC BY 4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c459t-6672764b6afe66178547fb9db094fb9579c3700c540040dcb51dffcadfd2516e3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
PQID 194227242
PQPubID 23500
PageCount 30
ParticipantIDs proquest_miscellaneous_28950358
proquest_miscellaneous_28828380
proquest_miscellaneous_1808068778
proquest_journals_194227242
pascalfrancis_primary_17110824
crossref_primary_10_1145_1082036_1082039
crossref_citationtrail_10_1145_1082036_1082039
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2005-07-01
PublicationDateYYYYMMDD 2005-07-01
PublicationDate_xml – month: 07
  year: 2005
  text: 2005-07-01
  day: 01
PublicationDecade 2000
PublicationPlace New York, NY
PublicationPlace_xml – name: New York, NY
– name: New York
PublicationTitle Journal of the ACM
PublicationYear 2005
Publisher Association for Computing Machinery
Publisher_xml – name: Association for Computing Machinery
References Sadakane K. (e_1_2_1_39_1)
Grossi R. (e_1_2_1_19_1)
Alstrup S. (e_1_2_1_1_1)
Huynh N. (e_1_2_1_24_1); 3109
Sadakane K. (e_1_2_1_40_1) 2002
Clark D. R. (e_1_2_1_7_1)
e_1_2_1_20_1
e_1_2_1_41_1
e_1_2_1_21_1
e_1_2_1_44_1
e_1_2_1_27_1
Burrows M. (e_1_2_1_5_1) 1994
e_1_2_1_28_1
e_1_2_1_25_1
Mäkinen V. (e_1_2_1_30_1)
Hon W. (e_1_2_1_22_1)
Grossi R. (e_1_2_1_18_1)
Kärkkäinen J. (e_1_2_1_26_1)
Raman R. (e_1_2_1_37_1)
Ferragina P. (e_1_2_1_15_1); 3246
Witten I. H. (e_1_2_1_43_1) 1999
Sadakane K. (e_1_2_1_42_1) 2001; 12
Chan H. (e_1_2_1_6_1); 3109
e_1_2_1_31_1
e_1_2_1_8_1
Ferragina P. (e_1_2_1_13_1)
e_1_2_1_3_1
e_1_2_1_12_1
e_1_2_1_35_1
e_1_2_1_4_1
Gonnet G. H. (e_1_2_1_16_1) 1992; 5
e_1_2_1_34_1
e_1_2_1_10_1
e_1_2_1_33_1
e_1_2_1_2_1
e_1_2_1_11_1
e_1_2_1_32_1
Mäkinen V. (e_1_2_1_29_1); 3109
e_1_2_1_38_1
e_1_2_1_14_1
Hon W. (e_1_2_1_23_1)
e_1_2_1_36_1
e_1_2_1_9_1
Grabowski S. (e_1_2_1_17_1); 3246
References_xml – volume-title: Proceeding of the 11th International Symposium on Algorithms and Computation
  ident: e_1_2_1_39_1
– volume-title: Proceedings of the 41st IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 390--398
  ident: e_1_2_1_13_1
– ident: e_1_2_1_35_1
  doi: 10.1016/S1570-8667(03)00066-2
– ident: e_1_2_1_28_1
  doi: 10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O
– ident: e_1_2_1_2_1
  doi: 10.5555/645898.672291
– volume-title: Proceedings of the 6th Workshop on Algorithm Engineering and Experiments. SIAM Press
  ident: e_1_2_1_23_1
– ident: e_1_2_1_27_1
  doi: 10.1137/S0097539797331105
– volume-title: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press
  year: 2002
  ident: e_1_2_1_40_1
– ident: e_1_2_1_25_1
  doi: 10.1007/PL00009205
– volume-title: Tech. Rep. 124
  year: 1994
  ident: e_1_2_1_5_1
– volume-title: Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press
  ident: e_1_2_1_19_1
– ident: e_1_2_1_21_1
  doi: 10.1101/gr.1350803
– volume: 3109
  volume-title: Proceedings of the 15th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science
  ident: e_1_2_1_29_1
– ident: e_1_2_1_9_1
  doi: 10.1007/PL00009202
– volume-title: Proceedings of the 15th International Symposium on Algorithms and Computation. Lecture Notes in Computer Science, Springer-Verlag
  ident: e_1_2_1_30_1
– volume-title: Proceedings of the 41st IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 198--207
  ident: e_1_2_1_1_1
– volume-title: Proceedings of the IEEE Data Compression Conference. IEEE Computer Society Press, Los Alamitos, Calif., 102--111
  ident: e_1_2_1_22_1
– ident: e_1_2_1_44_1
  doi: 10.1109/TIT.1978.1055934
– ident: e_1_2_1_14_1
  doi: 10.1016/S0020-0255(01)00098-6
– volume-title: Proceedings of the 3rd South American Workshop on String Processing, N. Ziviani, R. Baeza-Yates, and K. Guimarães, Eds
  ident: e_1_2_1_26_1
– ident: e_1_2_1_12_1
  doi: 10.1145/301970.301973
– ident: e_1_2_1_20_1
  doi: 10.1145/335305.335351
– volume: 5
  start-page: 66
  year: 1992
  ident: e_1_2_1_16_1
  article-title: New indices for text: PAT trees and PAT arrays. In Information Retrieval: Data Structures and Algorithms, B. Frakes and R. A. Baeza-Yates Eds. Prentice-Hall, Englewood Cliffs, N.J
  publication-title: Chapter
– volume: 12
  start-page: 175
  year: 2001
  ident: e_1_2_1_42_1
  article-title: Indexing huge genome sequences for solving various problems
  publication-title: Genome Informatics
– volume: 3109
  volume-title: Proceedings of the 15th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science
  ident: e_1_2_1_24_1
– volume-title: Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms. ACM
  ident: e_1_2_1_7_1
– ident: e_1_2_1_8_1
  doi: 10.1016/0020-0190(96)00061-0
– ident: e_1_2_1_31_1
  doi: 10.1137/0222058
– ident: e_1_2_1_3_1
  doi: 10.1145/5684.5688
– ident: e_1_2_1_10_1
  doi: 10.1093/comjnl/39.9.731
– volume: 3246
  volume-title: Proceedings of the 11th International Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science
  ident: e_1_2_1_15_1
– ident: e_1_2_1_11_1
  doi: 10.1145/1082036.1082043
– ident: e_1_2_1_33_1
  doi: 10.1145/321941.321946
– ident: e_1_2_1_36_1
  doi: 10.1137/S0097539700369909
– volume: 3109
  volume-title: Proceedings of the 15th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science
  ident: e_1_2_1_6_1
– volume: 3246
  volume-title: Proceedings of the 11th International Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science
  ident: e_1_2_1_17_1
– ident: e_1_2_1_32_1
  doi: 10.1145/382780.382782
– ident: e_1_2_1_4_1
  doi: 10.1137/S0097539795294165
– volume-title: Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press
  ident: e_1_2_1_18_1
– ident: e_1_2_1_34_1
  doi: 10.1006/jagm.2000.1151
– volume-title: Managing Gigabytes: Compressing and Indexing Documents and Images
  year: 1999
  ident: e_1_2_1_43_1
– volume-title: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press
  ident: e_1_2_1_37_1
– ident: e_1_2_1_38_1
  doi: 10.1016/S0020-0190(01)00298-8
– ident: e_1_2_1_41_1
  doi: 10.1016/S0196-6774(03)00087-7
SSID ssj0000207
Score 2.3804793
Snippet We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for...
SourceID proquest
pascalfrancis
crossref
SourceType Aggregation Database
Index Database
Enrichment Source
StartPage 552
SubjectTerms Algorithmics. Computability. Computer arithmetics
Algorithms
Applied sciences
Arrays
Compressed
Computer science; control theory; systems
Construction
Content analysis
Data compression
Data structures
Dictionaries
Entropy
Exact sciences and technology
Indexing
Indexing. Classification. Abstracting. Syntheses
Information and communication sciences
Information and document structure and analysis
Information processing and retrieval
Information retrieval
Information science. Documentation
Information storage
Internet
Queries
Query processing
Sciences and techniques of general use
Studies
Texts
Theoretical computing
Transforms
Title Indexing compressed text
URI https://www.proquest.com/docview/194227242
https://www.proquest.com/docview/1808068778
https://www.proquest.com/docview/28828380
https://www.proquest.com/docview/28950358
Volume 52
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCOhost Mathematics Source - HOST
  customDbUrl:
  eissn: 1557-735X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000207
  issn: 0004-5411
  databaseCode: AMVHM
  dateStart: 20040301
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1557-735X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000207
  issn: 0004-5411
  databaseCode: ADMLS
  dateStart: 20040301
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLeguyAhxqeWbYwgceCSkg8_Oz5G07YWqT1MG6q4RE5iS5NQNtHuwl-_59hO2o3C4BJXjtuk72e_92y_nx8hn4DWmYwTFTVS04gy1kRS8TgSFCcTmeSJkIYoPJuzySX9uoCFz9nu2CWralz_-i2v5H9QxTrE1bBk_wHZ_kexAj8jvnhFhPH6KIyn5qhDx5rtAlrRezSRHFs8zo5FctwvtZyenJ8XZ9N5YT1J1IK99Iv59-l82q2Zm2DVtr3aWB2APpLU4Tlg3IUt2lQR5sVmXaym51177UgjoE75KacQgUc8g8W6xoR0rWfQNfUH7o61pGCTsTxU0hS69YLcbIKObSkGe-T34O-ZqT54MOGGupDSp2QnRVUej8hOMfs2mQ0GOLUsef933IlO-Ngv9x664Yw8v5FLHBfaJjR5YJs7h-PiJXnhcAsLC_sr8kS1r8mumzWETicvscon5vB1b8ie7xjh0DFC0zHeksvTk4vjSeRSYEQ1BbGKmNkoZ7RiUitm2JxAua5EU-GsHEvgos54HNdgVHHc1BUkjda1bHSDjitT2Tsyaq9btUfCOFeSpTjDlsykDBASMiZqrTPIuaKSB2TsZVHW7nx4k6bkR2m561A64blSBORz_4UbezTK9qZHG8Id2jssA3LgpV26AbYsE0HTlKMPGZCP_V3UfmZLS7bq-habmGNRWc55HpAPW9qkOInMszz-UwsBMcph_2-veUCeDYPskIxWP2_Ve3RKV9WR64R3EFmF_Q
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Indexing+compressed+text&rft.jtitle=Journal+of+the+ACM&rft.au=FERRAGINA%2C+Paolo&rft.au=MANZINI%2C+Giovanni&rft.date=2005-07-01&rft.pub=Association+for+Computing+Machinery&rft.issn=0004-5411&rft.eissn=1557-735X&rft.volume=52&rft.issue=4&rft.spage=552&rft.epage=581&rft_id=info:doi/10.1145%2F1082036.1082039&rft.externalDBID=n%2Fa&rft.externalDocID=17110824
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0004-5411&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0004-5411&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0004-5411&client=summon