Indexing compressed text
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1,...
Saved in:
| Published in | Journal of the ACM Vol. 52; no. 4; pp. 552 - 581 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
New York, NY
Association for Computing Machinery
01.07.2005
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0004-5411 1557-735X |
| DOI | 10.1145/1082036.1082039 |
Cover
| Abstract | We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array .Our second compressed data structure achieves O ( p + occ ) query time using O ( n H k ( T )log ε n ) + o ( n ) bits of storage for any chosen ε, 0<ε<1. Therefore, it provides optimal output-sensitive query time using o ( n log n ) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm. |
|---|---|
| AbstractList | We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P[1,p] within a text T[1,n] in O(p occ log super(1 epsilon ) n) time for any chosen epsilon , 0< epsilon <1. This data structure uses at most 5nHk(T) o(n) bits of storage, where Hk(T) is the kth order empirical entropy of T. The space usage is Theta (n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array.Our second compressed data structure achieves O(pocc) query time using O(nHk(T)log super( epsilon ) n) o(n) bits of storage for any chosen epsilon , 0< epsilon <1. Therefore, it provides optimal output-sensitive query time using o(nlog n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm. We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array .Our second compressed data structure achieves O ( p + occ ) query time using O ( n H k ( T )log ε n ) + o ( n ) bits of storage for any chosen ε, 0<ε<1. Therefore, it provides optimal output-sensitive query time using o ( n log n ) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm. We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form. The first compressed data structure retrieves the occ occurrences of a pattern P [1,p ] within a text T [1,n ] in O(p + occ log ^super l + [epsilon]^ n ) time for any chosen [epsilon], 0 [is less than][epsilon][is less than] 1. This data structure uses at most 5nH^sub k^(T) + o(n) bits of storage, where H^sub k^(T) is the kth order empirical entropy of T. The space usage is [theta](n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows-Wheeler Transform, and can be regarded as a compressed suffix array. The second compressed data structure achieves O(p +occ) query time using O (nH^subk^ (T)log ^super [epsilon]^ n) + o(n) bits of storage for any chosen [epsilon], 0 [is less than][epsilon][is less than] 1. Therefore, it provides optimal output-sensitive query time using o(n log n ) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm. [PUBLICATION ABSTRACT] We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form. Our first compressed data structure retrieves the occ occurrences of a pattern P[1, p] within a text T[1, n] in O(p occ log"' n) time for any chosen E, O < E < 1. This data structure uses at most 5nHk(T)- o(n) bits of storage, where 1-1k(T) is the kth order empirical entropy of T. The space usage is e(n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the BurrowsWheeler Transform, and can be regarded as a compressed suffix array. Our second compressed data structure achieves O(p occ) query time using 0 (n Hk(T) log' n) + O(n) bits of storage for any chosen E, O < c < 1. Therefore, it provides optimal output-sensitive query time using o(n log n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the BurrowsWheeler Transform and the LZ78 algorithm. We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P[1,p] within a text T[1,n] in O(p occ log(1 epsilon) n) time for any chosen epsilon, 0 < epsilon < 1. This data structure uses at most 5nH(k)(T) o(n) bits of storage, where H(k)(T) is the kth order empirical entropy of T. The space usage is Theta(n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a compressed suffix array.Our second compressed data structure achieves O(p occ) query time using O(nH(k)(T)log(epsilon) n) o(n) bits of storage for any chosen epsilon, 0 < epsilon < 1. Therefore, it provides optimal output-sensitive query time using o(nlog n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the LZ78 algorithm. |
| Author | Manzini, Giovanni Ferragina, Paolo |
| Author_xml | – sequence: 1 givenname: Paolo surname: Ferragina fullname: Ferragina, Paolo organization: Università di Pisa, Pisa, Italy – sequence: 2 givenname: Giovanni surname: Manzini fullname: Manzini, Giovanni organization: Università del Piemonte Orientale, Alessandria, Italy |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17110824$$DView record in Pascal Francis |
| BookMark | eNqN0c1LwzAYBvAgE9ymZz0OQfHS7U2az6MMPwYDLwreSpom0tGlM-lg_vdmrCAMFE8Pgd_zkuQdoYFvvUXoEsMUY8pmGCSBnE8PqU7QEDMmMpGz9wEaAgDNGMX4DI1iXKUjEBBDdLXwld3V_mNi2vUm2BhtNensrjtHp0430V70OUZvjw-v8-ds-fK0mN8vM0OZ6jLOBRGcllw7yzkWklHhSlWVoGhKJpTJBYBhNF0AKlMyXDlndOUqwjC3-RjdHuZuQvu5tbEr1nU0tmm0t-02FkQqBjmT_4CSyFxCgnd_QixBApdC7GdeH9FVuw0-vbfAihIiCCUJ3fRIR6MbF7Q3dSw2oV7r8FVggfdfTpObHZwJbYzBuh8CxX5FRb-iPlVqsKOGqTvd1a3vgq6bX3vfx3aRxg |
| CODEN | JACOAH |
| CitedBy_id | crossref_primary_10_1016_j_is_2014_06_002 crossref_primary_10_1145_3607141 crossref_primary_10_1007_s10791_008_9050_3 crossref_primary_10_1016_j_tcs_2015_10_012 crossref_primary_10_1093_comjnl_bxx108 crossref_primary_10_1007_s00778_020_00636_3 crossref_primary_10_1109_TKDE_2022_3231780 crossref_primary_10_1007_s42979_024_02986_z crossref_primary_10_1186_s12859_019_2883_5 crossref_primary_10_1016_j_ic_2024_105153 crossref_primary_10_1145_2094072_2094073 crossref_primary_10_1016_j_ic_2024_105155 crossref_primary_10_1016_j_is_2015_08_008 crossref_primary_10_1145_2168752_2168761 crossref_primary_10_1016_j_tcs_2013_10_009 crossref_primary_10_1016_j_tcs_2017_02_020 crossref_primary_10_1002_spe_2694 crossref_primary_10_1007_s00453_017_0286_4 crossref_primary_10_1007_s00453_022_00955_7 crossref_primary_10_1109_COMST_2018_2876196 crossref_primary_10_1186_s12859_017_1574_3 crossref_primary_10_1089_cmb_2018_0230 crossref_primary_10_1016_j_jda_2012_07_009 crossref_primary_10_1093_bioinformatics_btx067 crossref_primary_10_1007_s00453_021_00855_2 crossref_primary_10_1016_j_ic_2019_01_006 crossref_primary_10_1093_bioinformatics_bts173 crossref_primary_10_1016_j_drudis_2017_01_014 crossref_primary_10_1007_s00453_017_0333_1 crossref_primary_10_1007_s00453_014_9873_9 crossref_primary_10_1142_S0219720019400110 crossref_primary_10_5626_JCSE_2009_3_1_015 crossref_primary_10_1145_1498698_1594228 crossref_primary_10_1016_j_ygeno_2024_110842 crossref_primary_10_3389_fgene_2014_00381 crossref_primary_10_1007_s12046_018_0832_z crossref_primary_10_1109_ACCESS_2022_3221520 crossref_primary_10_1093_bioinformatics_btt706 crossref_primary_10_1186_1471_2105_12_214 crossref_primary_10_1007_s00453_019_00605_5 crossref_primary_10_1093_bioinformatics_bty500 crossref_primary_10_1145_1671970_1883684 crossref_primary_10_1016_j_parco_2014_06_007 crossref_primary_10_1016_j_jda_2016_03_002 crossref_primary_10_1145_1367064_1367072 crossref_primary_10_1093_bioinformatics_bty183 crossref_primary_10_1186_1471_2105_9_242 crossref_primary_10_1145_1290672_1290680 crossref_primary_10_1016_j_jda_2012_12_003 crossref_primary_10_1186_s13059_018_1450_0 crossref_primary_10_3390_a2031105 crossref_primary_10_1007_s00453_012_9726_3 crossref_primary_10_5626_JCSE_2009_3_1_001 crossref_primary_10_1016_j_tcs_2007_05_030 crossref_primary_10_1016_j_tcs_2023_114128 crossref_primary_10_1016_j_tcs_2019_09_030 crossref_primary_10_1145_3301312 crossref_primary_10_1007_s00453_018_0475_9 crossref_primary_10_1093_bioinformatics_btaa546 crossref_primary_10_1007_s00453_013_9767_2 crossref_primary_10_1089_cmb_2017_0089 crossref_primary_10_1089_cmb_2019_0316 crossref_primary_10_1089_cmb_2015_0172 crossref_primary_10_1186_1471_2105_14_313 crossref_primary_10_1016_j_ic_2021_104795 crossref_primary_10_1093_bib_bbad320 crossref_primary_10_1007_s00453_013_9782_3 crossref_primary_10_1016_j_is_2021_101893 crossref_primary_10_1145_2590774 crossref_primary_10_1002_pmic_201000404 crossref_primary_10_1126_science_abg8871 crossref_primary_10_1145_3588684 crossref_primary_10_3389_fgene_2020_00632 crossref_primary_10_1587_transinf_E92_D_2025 crossref_primary_10_1142_S0129054118430037 crossref_primary_10_1016_j_dam_2018_07_017 crossref_primary_10_1016_j_ic_2012_02_002 crossref_primary_10_1109_TCBB_2011_127 crossref_primary_10_1145_3457197 crossref_primary_10_1002_spe_2227 crossref_primary_10_1016_j_ipl_2010_02_010 crossref_primary_10_1145_1613676_1613680 crossref_primary_10_1109_ACCESS_2019_2949655 crossref_primary_10_1007_s00453_013_9794_z crossref_primary_10_1016_j_csbj_2021_06_047 crossref_primary_10_1145_2594408 crossref_primary_10_1016_j_tcs_2020_11_036 crossref_primary_10_7717_peerj_14186 crossref_primary_10_1016_j_tcs_2009_03_007 crossref_primary_10_1089_cmb_2021_0445 crossref_primary_10_1016_j_ic_2021_104820 crossref_primary_10_1016_j_is_2020_101686 crossref_primary_10_1098_rsta_2013_0167 crossref_primary_10_1016_j_tcs_2020_11_041 crossref_primary_10_1007_s00453_021_00917_5 crossref_primary_10_1109_JPROC_2015_2455551 crossref_primary_10_1016_j_ic_2013_09_001 crossref_primary_10_1093_bib_bbt088 crossref_primary_10_1093_bib_bbt087 crossref_primary_10_1016_j_jda_2006_03_011 crossref_primary_10_1093_bioinformatics_btab655 crossref_primary_10_1109_TCBB_2015_2442974 crossref_primary_10_1007_s00453_017_0288_2 crossref_primary_10_1146_annurev_genom_120219_080406 crossref_primary_10_1016_j_tcs_2013_07_024 crossref_primary_10_1109_TKDE_2021_3114401 crossref_primary_10_1016_j_fsigen_2020_102257 crossref_primary_10_1016_j_jda_2015_01_004 crossref_primary_10_1016_j_tcs_2019_03_012 crossref_primary_10_1016_j_jda_2015_01_006 crossref_primary_10_1109_TKDE_2013_129 crossref_primary_10_1145_3626765 crossref_primary_10_1145_3434399 crossref_primary_10_1145_1240233_1240244 crossref_primary_10_4018_jitwe_2011070103 crossref_primary_10_1093_bioinformatics_bts414 crossref_primary_10_1109_TPDS_2021_3119402 crossref_primary_10_1145_1240233_1240243 crossref_primary_10_1186_s13015_019_0160_9 crossref_primary_10_1016_j_eswa_2016_12_033 crossref_primary_10_1007_s00453_020_00732_4 crossref_primary_10_1145_1877766_1877768 crossref_primary_10_1002_spe_1112 crossref_primary_10_1145_3644824 crossref_primary_10_1007_s40484_019_0181_x crossref_primary_10_1128_mBio_01344_20 crossref_primary_10_1007_s00453_013_9792_1 crossref_primary_10_1109_TCBB_2015_2430314 crossref_primary_10_1007_s00453_017_0380_7 crossref_primary_10_1016_j_tcs_2010_12_036 crossref_primary_10_1145_2629339 crossref_primary_10_7717_peerj_3126 crossref_primary_10_1007_s00778_008_0094_1 crossref_primary_10_1007_s00453_013_9863_3 crossref_primary_10_1109_TCBB_2018_2831691 crossref_primary_10_1007_s11227_022_04890_w crossref_primary_10_1016_j_jcss_2020_12_001 crossref_primary_10_1145_1198513_1198521 crossref_primary_10_1186_s12859_014_0438_3 crossref_primary_10_1007_s42514_023_00153_z crossref_primary_10_1007_s00453_016_0165_4 crossref_primary_10_1098_rsta_2013_0135 crossref_primary_10_14778_2350229_2350265 crossref_primary_10_1093_bioinformatics_btz192 crossref_primary_10_1098_rsta_2013_0137 crossref_primary_10_1016_j_tcs_2017_08_002 crossref_primary_10_1109_ACCESS_2020_3031159 crossref_primary_10_1145_2699876 crossref_primary_10_1016_j_jcss_2011_09_002 crossref_primary_10_1007_s00778_015_0409_y crossref_primary_10_1093_bioinformatics_btac656 crossref_primary_10_1093_bioinformatics_btw266 crossref_primary_10_1016_j_tcs_2017_12_021 crossref_primary_10_1007_s11047_022_09882_6 crossref_primary_10_1016_j_ic_2021_104749 crossref_primary_10_1145_2670128 crossref_primary_10_1371_journal_pone_0086869 crossref_primary_10_1186_s13015_019_0148_5 crossref_primary_10_1016_j_tcs_2018_09_007 crossref_primary_10_1145_3043958 crossref_primary_10_1145_2000807_2000821 crossref_primary_10_1007_s11786_016_0281_1 crossref_primary_10_1016_j_jda_2012_09_002 crossref_primary_10_1145_2000807_2000820 crossref_primary_10_1016_j_tcs_2022_12_034 crossref_primary_10_1016_j_tcs_2018_06_029 crossref_primary_10_1093_bioinformatics_btab217 crossref_primary_10_1093_bioinformatics_btw811 crossref_primary_10_14778_3665844_3665852 crossref_primary_10_1093_bioinformatics_bts690 crossref_primary_10_1145_3653314 crossref_primary_10_7717_peerj_cs_636 crossref_primary_10_3389_fgene_2020_00572 crossref_primary_10_1016_j_ic_2011_03_001 crossref_primary_10_1093_bib_bbab519 crossref_primary_10_1093_nar_gks408 crossref_primary_10_1093_comjnl_bxaa016 crossref_primary_10_1016_j_jda_2013_03_007 crossref_primary_10_1093_bib_bbx062 crossref_primary_10_1038_nrg3433 crossref_primary_10_14778_3598581_3598586 crossref_primary_10_1007_s11786_016_0283_z crossref_primary_10_1109_TCBB_2020_2968323 crossref_primary_10_1007_s00453_019_00637_x crossref_primary_10_1089_cmb_2009_0169 crossref_primary_10_1145_1412228_1455263 crossref_primary_10_1145_1412228_1455268 crossref_primary_10_1145_3432999 crossref_primary_10_14778_3236187_3236203 crossref_primary_10_1145_2635816 crossref_primary_10_1016_j_tcs_2011_12_002 crossref_primary_10_1016_j_tcs_2015_08_008 crossref_primary_10_1002_mp_14814 crossref_primary_10_1016_j_jda_2013_07_004 crossref_primary_10_1016_j_dam_2018_03_035 crossref_primary_10_1186_s13015_024_00260_8 crossref_primary_10_1016_j_tcs_2019_08_005 crossref_primary_10_1016_j_entcs_2014_01_021 crossref_primary_10_1186_s12859_024_05728_3 crossref_primary_10_1109_TCBB_2013_2297101 crossref_primary_10_1016_j_tcs_2007_07_013 crossref_primary_10_1137_090779759 crossref_primary_10_1093_bioinformatics_btz575 crossref_primary_10_1145_3375890 crossref_primary_10_1016_j_tcs_2007_07_017 crossref_primary_10_1016_j_tcs_2007_07_018 crossref_primary_10_1016_j_tcs_2011_05_023 crossref_primary_10_1016_j_tcs_2012_10_050 crossref_primary_10_1093_nar_gkae097 crossref_primary_10_1016_j_jda_2018_08_001 crossref_primary_10_1016_j_tcs_2007_07_020 crossref_primary_10_1109_TCBB_2021_3108843 crossref_primary_10_1145_3381417 crossref_primary_10_3724_SP_J_1001_2009_03500 crossref_primary_10_1145_1993036_1993040 crossref_primary_10_1002_spe_2377 crossref_primary_10_1089_cmb_2017_0265 crossref_primary_10_1016_j_ic_2023_105068 crossref_primary_10_1093_bioinformatics_btac226 crossref_primary_10_14778_2535569_2448951 crossref_primary_10_1007_s00778_023_00811_2 crossref_primary_10_1038_s41587_023_01662_6 crossref_primary_10_1145_1216370_1216372 crossref_primary_10_1109_TKDE_2023_3316274 crossref_primary_10_1137_070685373 crossref_primary_10_7763_IJBBB_2013_V3_183 crossref_primary_10_1145_1868237_1868248 crossref_primary_10_1093_bioinformatics_btab264 crossref_primary_10_1093_bioinformatics_btz350 crossref_primary_10_1016_j_jda_2016_10_001 crossref_primary_10_1109_MDAT_2013_2284198 crossref_primary_10_1007_s00453_014_9936_y crossref_primary_10_1186_1471_2164_15_S5_S2 crossref_primary_10_1016_j_tcs_2019_11_002 crossref_primary_10_1016_j_tcs_2019_11_001 crossref_primary_10_1093_bib_bbw058 crossref_primary_10_1371_journal_pone_0090581 crossref_primary_10_1145_3524060 crossref_primary_10_1016_j_tcs_2012_02_002 crossref_primary_10_1093_bioinformatics_bts276 crossref_primary_10_1016_j_tcs_2012_02_006 crossref_primary_10_1007_s00453_015_0056_0 crossref_primary_10_1007_s00453_015_9990_0 crossref_primary_10_1093_bioinformatics_btz341 crossref_primary_10_1145_3462333 crossref_primary_10_1016_j_tcs_2006_12_012 crossref_primary_10_1145_3550454_3555512 crossref_primary_10_1145_3481638 crossref_primary_10_1038_nbt_3442 crossref_primary_10_1016_j_tcs_2017_06_016 |
| Cites_doi | 10.1016/S1570-8667(03)00066-2 10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O 10.5555/645898.672291 10.1137/S0097539797331105 10.1007/PL00009205 10.1101/gr.1350803 10.1007/PL00009202 10.1109/TIT.1978.1055934 10.1016/S0020-0255(01)00098-6 10.1145/301970.301973 10.1145/335305.335351 10.1016/0020-0190(96)00061-0 10.1137/0222058 10.1145/5684.5688 10.1093/comjnl/39.9.731 10.1145/1082036.1082043 10.1145/321941.321946 10.1137/S0097539700369909 10.1145/382780.382782 10.1137/S0097539795294165 10.1006/jagm.2000.1151 10.1016/S0020-0190(01)00298-8 10.1016/S0196-6774(03)00087-7 |
| ContentType | Journal Article |
| Copyright | 2005 INIST-CNRS Copyright Association for Computing Machinery Jul 2005 |
| Copyright_xml | – notice: 2005 INIST-CNRS – notice: Copyright Association for Computing Machinery Jul 2005 |
| DBID | AAYXX CITATION IQODW 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1145/1082036.1082039 |
| DatabaseName | CrossRef Pascal-Francis Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts CrossRef Computer and Information Systems Abstracts Computer and Information Systems Abstracts Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science Applied Sciences |
| EISSN | 1557-735X |
| EndPage | 581 |
| ExternalDocumentID | 910484081 17110824 10_1145_1082036_1082039 |
| Genre | Feature |
| GroupedDBID | --Z -DZ -~X .DC .GJ 29L 3R3 4.4 41~ 5GY 5VS 85S 8US 8VB 9M8 AAHTB AAIKC AAKMM AALFJ AAMNW AAYFX AAYXX ABCQX ABFSI ABPEJ ABPPZ ACGOD ACM ACNCT ADBCU ADL ADMLS AEBYY AEFXT AEGXH AEJOY AEMOZ AENEX AENSD AETEA AFWIH AFWXC AGHSJ AHQJS AI. AIKLT AKRVB AKVCP ALMA_UNASSIGNED_HOLDINGS AMVHM ASPBG AVWKF BDXCO CCLIF CITATION CS3 D0L DU5 E.L EBS EJD FA8 FEDTE GUFHI HF~ HGAVV H~9 IAO ICD IEA IOF ITC IVC K1G L7B LHSKQ MVM OHT P-O P1C PQQKQ QWB RNS ROL TAE TH9 TN5 UKR UPT VH1 WH7 XJT XOL XSW YQT ZCA ZCG ZL0 ZY4 .4S 63O AAYOK ABGDV ABQDU ABTAH ACATF ACVLL ADHRN ADPZR AFDAS AFJFK ARCSS EBE EBO EBR EBU EDO EMK EPL F20 I-F IQODW MK~ ML~ TUS UAO W7O XFK 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c459t-6672764b6afe66178547fb9db094fb9579c3700c540040dcb51dffcadfd2516e3 |
| ISSN | 0004-5411 |
| IngestDate | Fri Sep 05 07:25:10 EDT 2025 Thu Sep 04 15:32:24 EDT 2025 Thu Aug 07 14:53:57 EDT 2025 Fri Jul 25 03:01:50 EDT 2025 Sun Oct 22 16:06:33 EDT 2023 Wed Oct 01 06:00:40 EDT 2025 Thu Apr 24 23:04:06 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | Lempel-Ziv compressor suffix array Database query Data compression Information retrieval full-text indexing Design Suffix Algorithms pattern searching Theory Burrows-Wheeler transform Information storage Full text suffix tree Data structure indexing data structure text compression Indexing Information theory |
| Language | English |
| License | CC BY 4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c459t-6672764b6afe66178547fb9db094fb9579c3700c540040dcb51dffcadfd2516e3 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 |
| PQID | 194227242 |
| PQPubID | 23500 |
| PageCount | 30 |
| ParticipantIDs | proquest_miscellaneous_28950358 proquest_miscellaneous_28828380 proquest_miscellaneous_1808068778 proquest_journals_194227242 pascalfrancis_primary_17110824 crossref_primary_10_1145_1082036_1082039 crossref_citationtrail_10_1145_1082036_1082039 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2005-07-01 |
| PublicationDateYYYYMMDD | 2005-07-01 |
| PublicationDate_xml | – month: 07 year: 2005 text: 2005-07-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | New York, NY |
| PublicationPlace_xml | – name: New York, NY – name: New York |
| PublicationTitle | Journal of the ACM |
| PublicationYear | 2005 |
| Publisher | Association for Computing Machinery |
| Publisher_xml | – name: Association for Computing Machinery |
| References | Sadakane K. (e_1_2_1_39_1) Grossi R. (e_1_2_1_19_1) Alstrup S. (e_1_2_1_1_1) Huynh N. (e_1_2_1_24_1); 3109 Sadakane K. (e_1_2_1_40_1) 2002 Clark D. R. (e_1_2_1_7_1) e_1_2_1_20_1 e_1_2_1_41_1 e_1_2_1_21_1 e_1_2_1_44_1 e_1_2_1_27_1 Burrows M. (e_1_2_1_5_1) 1994 e_1_2_1_28_1 e_1_2_1_25_1 Mäkinen V. (e_1_2_1_30_1) Hon W. (e_1_2_1_22_1) Grossi R. (e_1_2_1_18_1) Kärkkäinen J. (e_1_2_1_26_1) Raman R. (e_1_2_1_37_1) Ferragina P. (e_1_2_1_15_1); 3246 Witten I. H. (e_1_2_1_43_1) 1999 Sadakane K. (e_1_2_1_42_1) 2001; 12 Chan H. (e_1_2_1_6_1); 3109 e_1_2_1_31_1 e_1_2_1_8_1 Ferragina P. (e_1_2_1_13_1) e_1_2_1_3_1 e_1_2_1_12_1 e_1_2_1_35_1 e_1_2_1_4_1 Gonnet G. H. (e_1_2_1_16_1) 1992; 5 e_1_2_1_34_1 e_1_2_1_10_1 e_1_2_1_33_1 e_1_2_1_2_1 e_1_2_1_11_1 e_1_2_1_32_1 Mäkinen V. (e_1_2_1_29_1); 3109 e_1_2_1_38_1 e_1_2_1_14_1 Hon W. (e_1_2_1_23_1) e_1_2_1_36_1 e_1_2_1_9_1 Grabowski S. (e_1_2_1_17_1); 3246 |
| References_xml | – volume-title: Proceeding of the 11th International Symposium on Algorithms and Computation ident: e_1_2_1_39_1 – volume-title: Proceedings of the 41st IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 390--398 ident: e_1_2_1_13_1 – ident: e_1_2_1_35_1 doi: 10.1016/S1570-8667(03)00066-2 – ident: e_1_2_1_28_1 doi: 10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O – ident: e_1_2_1_2_1 doi: 10.5555/645898.672291 – volume-title: Proceedings of the 6th Workshop on Algorithm Engineering and Experiments. SIAM Press ident: e_1_2_1_23_1 – ident: e_1_2_1_27_1 doi: 10.1137/S0097539797331105 – volume-title: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press year: 2002 ident: e_1_2_1_40_1 – ident: e_1_2_1_25_1 doi: 10.1007/PL00009205 – volume-title: Tech. Rep. 124 year: 1994 ident: e_1_2_1_5_1 – volume-title: Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press ident: e_1_2_1_19_1 – ident: e_1_2_1_21_1 doi: 10.1101/gr.1350803 – volume: 3109 volume-title: Proceedings of the 15th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science ident: e_1_2_1_29_1 – ident: e_1_2_1_9_1 doi: 10.1007/PL00009202 – volume-title: Proceedings of the 15th International Symposium on Algorithms and Computation. Lecture Notes in Computer Science, Springer-Verlag ident: e_1_2_1_30_1 – volume-title: Proceedings of the 41st IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 198--207 ident: e_1_2_1_1_1 – volume-title: Proceedings of the IEEE Data Compression Conference. IEEE Computer Society Press, Los Alamitos, Calif., 102--111 ident: e_1_2_1_22_1 – ident: e_1_2_1_44_1 doi: 10.1109/TIT.1978.1055934 – ident: e_1_2_1_14_1 doi: 10.1016/S0020-0255(01)00098-6 – volume-title: Proceedings of the 3rd South American Workshop on String Processing, N. Ziviani, R. Baeza-Yates, and K. Guimarães, Eds ident: e_1_2_1_26_1 – ident: e_1_2_1_12_1 doi: 10.1145/301970.301973 – ident: e_1_2_1_20_1 doi: 10.1145/335305.335351 – volume: 5 start-page: 66 year: 1992 ident: e_1_2_1_16_1 article-title: New indices for text: PAT trees and PAT arrays. In Information Retrieval: Data Structures and Algorithms, B. Frakes and R. A. Baeza-Yates Eds. Prentice-Hall, Englewood Cliffs, N.J publication-title: Chapter – volume: 12 start-page: 175 year: 2001 ident: e_1_2_1_42_1 article-title: Indexing huge genome sequences for solving various problems publication-title: Genome Informatics – volume: 3109 volume-title: Proceedings of the 15th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science ident: e_1_2_1_24_1 – volume-title: Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms. ACM ident: e_1_2_1_7_1 – ident: e_1_2_1_8_1 doi: 10.1016/0020-0190(96)00061-0 – ident: e_1_2_1_31_1 doi: 10.1137/0222058 – ident: e_1_2_1_3_1 doi: 10.1145/5684.5688 – ident: e_1_2_1_10_1 doi: 10.1093/comjnl/39.9.731 – volume: 3246 volume-title: Proceedings of the 11th International Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science ident: e_1_2_1_15_1 – ident: e_1_2_1_11_1 doi: 10.1145/1082036.1082043 – ident: e_1_2_1_33_1 doi: 10.1145/321941.321946 – ident: e_1_2_1_36_1 doi: 10.1137/S0097539700369909 – volume: 3109 volume-title: Proceedings of the 15th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science ident: e_1_2_1_6_1 – volume: 3246 volume-title: Proceedings of the 11th International Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science ident: e_1_2_1_17_1 – ident: e_1_2_1_32_1 doi: 10.1145/382780.382782 – ident: e_1_2_1_4_1 doi: 10.1137/S0097539795294165 – volume-title: Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press ident: e_1_2_1_18_1 – ident: e_1_2_1_34_1 doi: 10.1006/jagm.2000.1151 – volume-title: Managing Gigabytes: Compressing and Indexing Documents and Images year: 1999 ident: e_1_2_1_43_1 – volume-title: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM Press ident: e_1_2_1_37_1 – ident: e_1_2_1_38_1 doi: 10.1016/S0020-0190(01)00298-8 – ident: e_1_2_1_41_1 doi: 10.1016/S0196-6774(03)00087-7 |
| SSID | ssj0000207 |
| Score | 2.3804793 |
| Snippet | We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for... |
| SourceID | proquest pascalfrancis crossref |
| SourceType | Aggregation Database Index Database Enrichment Source |
| StartPage | 552 |
| SubjectTerms | Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Arrays Compressed Computer science; control theory; systems Construction Content analysis Data compression Data structures Dictionaries Entropy Exact sciences and technology Indexing Indexing. Classification. Abstracting. Syntheses Information and communication sciences Information and document structure and analysis Information processing and retrieval Information retrieval Information science. Documentation Information storage Internet Queries Query processing Sciences and techniques of general use Studies Texts Theoretical computing Transforms |
| Title | Indexing compressed text |
| URI | https://www.proquest.com/docview/194227242 https://www.proquest.com/docview/1808068778 https://www.proquest.com/docview/28828380 https://www.proquest.com/docview/28950358 |
| Volume | 52 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: EBSCOhost Mathematics Source - HOST customDbUrl: eissn: 1557-735X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000207 issn: 0004-5411 databaseCode: AMVHM dateStart: 20040301 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1557-735X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000207 issn: 0004-5411 databaseCode: ADMLS dateStart: 20040301 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLeguyAhxqeWbYwgceCSkg8_Oz5G07YWqT1MG6q4RE5iS5NQNtHuwl-_59hO2o3C4BJXjtuk72e_92y_nx8hn4DWmYwTFTVS04gy1kRS8TgSFCcTmeSJkIYoPJuzySX9uoCFz9nu2CWralz_-i2v5H9QxTrE1bBk_wHZ_kexAj8jvnhFhPH6KIyn5qhDx5rtAlrRezSRHFs8zo5FctwvtZyenJ8XZ9N5YT1J1IK99Iv59-l82q2Zm2DVtr3aWB2APpLU4Tlg3IUt2lQR5sVmXaym51177UgjoE75KacQgUc8g8W6xoR0rWfQNfUH7o61pGCTsTxU0hS69YLcbIKObSkGe-T34O-ZqT54MOGGupDSp2QnRVUej8hOMfs2mQ0GOLUsef933IlO-Ngv9x664Yw8v5FLHBfaJjR5YJs7h-PiJXnhcAsLC_sr8kS1r8mumzWETicvscon5vB1b8ie7xjh0DFC0zHeksvTk4vjSeRSYEQ1BbGKmNkoZ7RiUitm2JxAua5EU-GsHEvgos54HNdgVHHc1BUkjda1bHSDjitT2Tsyaq9btUfCOFeSpTjDlsykDBASMiZqrTPIuaKSB2TsZVHW7nx4k6bkR2m561A64blSBORz_4UbezTK9qZHG8Id2jssA3LgpV26AbYsE0HTlKMPGZCP_V3UfmZLS7bq-habmGNRWc55HpAPW9qkOInMszz-UwsBMcph_2-veUCeDYPskIxWP2_Ve3RKV9WR64R3EFmF_Q |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Indexing+compressed+text&rft.jtitle=Journal+of+the+ACM&rft.au=FERRAGINA%2C+Paolo&rft.au=MANZINI%2C+Giovanni&rft.date=2005-07-01&rft.pub=Association+for+Computing+Machinery&rft.issn=0004-5411&rft.eissn=1557-735X&rft.volume=52&rft.issue=4&rft.spage=552&rft.epage=581&rft_id=info:doi/10.1145%2F1082036.1082039&rft.externalDBID=n%2Fa&rft.externalDocID=17110824 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0004-5411&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0004-5411&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0004-5411&client=summon |