Efficient and Compact Representations of Prefix Codes
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and co...
Saved in:
| Published in | IEEE transactions on information theory Vol. 61; no. 9; pp. 4999 - 5011 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.09.2015
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0018-9448 1557-9654 |
| DOI | 10.1109/TIT.2015.2452252 |
Cover
| Abstract | Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then, a naive storage of an optimal prefix code uses O(n log n) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any 0 <; ε <; 1/2, takes O(n log log(1/E)) bits to store a prefix code with an average codeword length within an additive ε of the minimum. Finally, a second approximation takes, for any constant c > 1, O(n 1/c log n) bits to store a prefix code with an average codeword length at most c times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1) time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented. |
|---|---|
| AbstractList | Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then, a naive storage of an optimal prefix code uses O(n log n) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any 0 <; ε <; 1/2, takes O(n log log(1/E)) bits to store a prefix code with an average codeword length within an additive ε of the minimum. Finally, a second approximation takes, for any constant c > 1, O(n 1/c log n) bits to store a prefix code with an average codeword length at most c times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1) time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented. Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let ... be the sequence length and ... be the alphabet size. Then, a naive storage of an optimal prefix code uses ... bits. Our first technique shows how to use ... bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any ..., takes ... bits to store a prefix code with an average codeword length within an additive ... of the minimum. Finally, a second approximation takes, for any constant ..., ... bits to store a prefix code with an average codeword length at most ... times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in ... time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented. (ProQuest: ... denotes formulae/symbols omitted.) |
| Author | Nekrich, Yakov Gagie, Travis Ordonez, Alberto Navarro, Gonzalo |
| Author_xml | – sequence: 1 givenname: Travis surname: Gagie fullname: Gagie, Travis email: travis.gagie@gmail.com organization: Dept. of Comput. Sci., Univ. of Helsinki, Helsinki, Finland – sequence: 2 givenname: Gonzalo surname: Navarro fullname: Navarro, Gonzalo email: gnavarro@dcc.uchile.cl organization: Dept. of Comput. Sci., Univ. of Chile, Santiago, Chile – sequence: 3 givenname: Yakov surname: Nekrich fullname: Nekrich, Yakov email: yakov.nekrich@googlemail.com organization: David R. Cheriton Sch. of Comput. Sci., Univ. of Waterloo, Waterloo, ON, Canada – sequence: 4 givenname: Alberto surname: Ordonez fullname: Ordonez, Alberto email: alberto.ordonez@udc.es organization: Database Lab., Univ. da Coruna, Coruña, Spain |
| BookMark | eNp9kEFLAzEQhYNUsK3eBS8Lnrdmsslm9yilaqGgSD2HbDKBlHZ3Tbag_96ULR48eHrM8L2Zx5uRSdu1SMgt0AUArR-26-2CURALxgVjgl2QKQgh87oUfEKmlEKV15xXV2QW4y6NXACbErFyzhuP7ZDp1mbL7tBrM2Tv2AeMaasH37Ux61z2FtD5r0RYjNfk0ul9xJuzzsnH02q7fMk3r8_r5eMmN6yGIWe8AdSorSids1SaotRW8qZGZGAqY2gDojY0KbWYCAHUFQCN1aIpG1nMyf14tw_d5xHjoHbdMbTppQJJJa0E1EWiypEyoYsxxVTGj8GHoP1eAVWnilSqSJ0qUueKkpH-MfbBH3T4_s9yN1o8Iv7iEgTnJSt-AMcjcyM |
| CODEN | IETTAW |
| CitedBy_id | crossref_primary_10_1145_3342555 crossref_primary_10_1145_3397175 crossref_primary_10_1016_j_tcs_2022_01_010 crossref_primary_10_3390_s21186236 crossref_primary_10_1016_j_ipl_2022_106274 |
| Cites_doi | 10.1137/S0097539799364092 10.1145/1082036.1082043 10.1002/j.1538-7305.1959.tb01583.x 10.1109/18.86980 10.1109/JRPROC.1952.273898 10.1145/253495.342777 10.1016/j.ipl.2005.10.006 10.1006/jcss.2001.1779 10.1016/j.tcs.2013.10.019 10.1016/j.ipl.2006.04.008 10.1007/BF01683268 10.1007/978-3-642-34109-0_39 10.1145/1412228.1412230 10.1007/s00453-001-0060-4 10.1002/spe.4380190207 10.1016/j.ipl.2008.07.004 10.1145/348751.348754 10.1145/5684.5688 10.1109/26.634683 10.1007/978-3-642-24583-1_18 10.1016/0022-0000(93)90040-4 10.1145/79147.79150 10.1109/TIT.1976.1055554 10.1016/0020-0190(93)90207-P 10.1109/26.469442 10.1109/26.843129 10.1016/0020-0190(88)90146-9 10.1145/363958.363991 10.1137/1.9781611972900.9 10.1007/978-3-642-32241-9_34 10.1007/s10791-006-9001-9 10.1109/DCC.2013.46 10.1007/978-3-642-03367-4_28 10.1145/828.1884 10.1007/978-3-642-11266-9_35 10.1016/j.jda.2013.07.004 10.1002/spe.741 10.1109/DCC.2003.1194057 10.1145/1841909.1841913 10.1109/DCC.1992.227470 10.1145/1240233.1240243 10.1007/s00453-007-9140-4 10.1137/0121057 10.1007/978-3-642-33090-2_17 10.1007/s10791-012-9184-1 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2015 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2015 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TIT.2015.2452252 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1557-9654 |
| EndPage | 5011 |
| ExternalDocumentID | 3788278731 10_1109_TIT_2015_2452252 7154462 |
| Genre | orig-research Feature |
| GrantInformation_xml | – fundername: ICT COST Action grantid: IC1302 – fundername: Millennium Nucleus Information and Coordination in Networks grantid: ICM/FIC RC130003 – fundername: MINECO grantid: CDTI-00064563/ITC-20133062 – fundername: FPU Program – fundername: Xunta de Galicia co-founded with FEDER grantid: GRC2013/053; Grant AP2010-6038 – fundername: CDTI – fundername: AGI – fundername: MINECO through PGE and FEDER grantid: TIN2013-46238-C4-3-R; TIN2013-47090- C3-3-P |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 VH1 VJK AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c291t-24b1eaead56ffd07c36ad74b9ee21c8cc0b159c0c0b0defd0510f311bda5b6b73 |
| IEDL.DBID | RIE |
| ISSN | 0018-9448 |
| IngestDate | Sun Oct 05 00:24:15 EDT 2025 Thu Apr 24 22:56:25 EDT 2025 Wed Oct 01 02:55:10 EDT 2025 Tue Aug 26 16:50:03 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Keywords | data compression Computers and information processing huffman coding data systems |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c291t-24b1eaead56ffd07c36ad74b9ee21c8cc0b159c0c0b0defd0510f311bda5b6b73 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 |
| PQID | 1707085193 |
| PQPubID | 36024 |
| PageCount | 13 |
| ParticipantIDs | proquest_journals_1707085193 crossref_citationtrail_10_1109_TIT_2015_2452252 crossref_primary_10_1109_TIT_2015_2452252 ieee_primary_7154462 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2015-Sept. 2015-9-00 20150901 |
| PublicationDateYYYYMMDD | 2015-09-01 |
| PublicationDate_xml | – month: 09 year: 2015 text: 2015-Sept. |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on information theory |
| PublicationTitleAbbrev | TIT |
| PublicationYear | 2015 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 ref11 ref10 grossi (ref25) 2003 ref17 ref16 ref19 ref18 ref51 ref50 barbay (ref3) 2009 knuth (ref32) 1973; 3 ref48 ref47 ref42 ref41 ref44 ref43 ref49 navarro (ref45) 2012; 7276 ref8 bowe (ref7) 2010 ref9 ref4 ref6 ref5 p?tra?cu (ref46) 2008 witten (ref52) 1999 ref40 ref35 ref34 ref37 ref36 ref31 ref30 ref33 ref2 ref1 ref39 ref38 ref24 ref23 ref26 ref20 ref22 ref21 ref28 ref27 ref29 |
| References_xml | – ident: ref40 doi: 10.1137/S0097539799364092 – ident: ref15 doi: 10.1145/1082036.1082043 – ident: ref24 doi: 10.1002/j.1538-7305.1959.tb01583.x – ident: ref41 doi: 10.1109/18.86980 – ident: ref28 doi: 10.1109/JRPROC.1952.273898 – ident: ref12 doi: 10.1145/253495.342777 – ident: ref19 doi: 10.1016/j.ipl.2005.10.006 – ident: ref1 doi: 10.1006/jcss.2001.1779 – ident: ref4 doi: 10.1016/j.tcs.2013.10.019 – ident: ref20 doi: 10.1016/j.ipl.2006.04.008 – ident: ref51 doi: 10.1007/BF01683268 – ident: ref10 doi: 10.1007/978-3-642-34109-0_39 – ident: ref42 doi: 10.1145/1412228.1412230 – ident: ref35 doi: 10.1007/s00453-001-0060-4 – ident: ref37 doi: 10.1002/spe.4380190207 – ident: ref21 doi: 10.1016/j.ipl.2008.07.004 – ident: ref39 doi: 10.1145/348751.348754 – ident: ref6 doi: 10.1145/5684.5688 – ident: ref38 doi: 10.1109/26.634683 – ident: ref29 doi: 10.1007/978-3-642-24583-1_18 – ident: ref18 doi: 10.1016/0022-0000(93)90040-4 – start-page: 841 year: 2003 ident: ref25 article-title: High-order entropy-compressed text indexes publication-title: Proc 14th Annu ACM-SIAM Symp Discrete Algorithms (SODA) – year: 2010 ident: ref7 article-title: Multiary wavelet trees in practice – ident: ref33 doi: 10.1145/79147.79150 – ident: ref31 doi: 10.1109/TIT.1976.1055554 – ident: ref11 doi: 10.1016/0020-0190(93)90207-P – ident: ref26 doi: 10.1109/26.469442 – start-page: 111 year: 2009 ident: ref3 article-title: Compressed representations of permutations, and applications publication-title: Proc 26th Int Symp Theoretical Aspects Comput Sci (STACS) – ident: ref50 doi: 10.1109/26.843129 – ident: ref49 doi: 10.1016/0020-0190(88)90146-9 – volume: 7276 start-page: 295 year: 2012 ident: ref45 article-title: Fast, small, simple rank/select on bitmaps publication-title: Proc 11th Int Symp Experim Algorithms (SEA) – ident: ref47 doi: 10.1145/363958.363991 – ident: ref2 doi: 10.1137/1.9781611972900.9 – ident: ref14 doi: 10.1007/978-3-642-32241-9_34 – year: 1999 ident: ref52 publication-title: Managing Gigabytes Compressing and Indexing Documents and Images – ident: ref9 doi: 10.1007/s10791-006-9001-9 – ident: ref44 doi: 10.1109/DCC.2013.46 – ident: ref23 doi: 10.1007/978-3-642-03367-4_28 – ident: ref17 doi: 10.1145/828.1884 – ident: ref22 doi: 10.1007/978-3-642-11266-9_35 – ident: ref43 doi: 10.1016/j.jda.2013.07.004 – ident: ref34 doi: 10.1002/spe.741 – volume: 3 year: 1973 ident: ref32 publication-title: The Art of Computer Programming Sorting and Searching – ident: ref36 doi: 10.1109/DCC.2003.1194057 – ident: ref13 doi: 10.1145/1841909.1841913 – year: 2008 ident: ref46 article-title: Time-space trade-offs for predecessor search – ident: ref48 doi: 10.1109/DCC.1992.227470 – ident: ref16 doi: 10.1145/1240233.1240243 – ident: ref30 doi: 10.1007/s00453-007-9140-4 – ident: ref27 doi: 10.1137/0121057 – ident: ref5 doi: 10.1007/978-3-642-33090-2_17 – ident: ref8 doi: 10.1007/s10791-012-9184-1 |
| SSID | ssj0014512 |
| Score | 2.2185256 |
| Snippet | Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes.... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 4999 |
| SubjectTerms | Additives Algorithms Approximation Approximation methods Arrays Codes Cryptography Decoding Efficiency Encoding Heuristic Random access memory Vegetation |
| Title | Efficient and Compact Representations of Prefix Codes |
| URI | https://ieeexplore.ieee.org/document/7154462 https://www.proquest.com/docview/1707085193 |
| Volume | 61 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 1557-9654 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014512 issn: 0018-9448 databaseCode: RIE dateStart: 19630101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA9zT_rgdFOcTumDL4Ltki7px6PIxhQmIhvsreTjCqJ04joQ_3qTNC1-IT61D5eQ5HKXu9zldwidp1hywbHwFZe5T0mi_JRB7OdYGG-E04ibC_3ZXTRd0NslW7bQZfMWBgBs8hkE5tfG8tVKbsxV2TA20DFG4W7FSVS91WoiBpSRChmcaAHWPkcdksTpcH4zNzlcLDBRxpCFX44gW1PlhyK2p8ukg2b1uKqkkqdgU4pAvn-DbPzvwPfQrjMzvatqX-yjFhRd1KlLOHhOorto5xMeYQ-xsQWU0H15vFCe1RWy9B5stqx7pFSsvVXu3etZPb5pCgXrA7SYjOfXU9_VVfBlmJLSD6kgwPUWYlGeKxzLUcRVTEUKEBKZSImFNnIk1l-sQFNouc1HhAjFmYhEPDpE7WJVwJFJjMIQAVWhgojqafOQ5wkfKcxBakcu7KNhvdSZdKDjpvbFc2adD5xmmjmZYU7mmNNHF02Llwpw4w_anlnrhs4tcx8Nam5mTiLXGYm1ckuMvXr8e6sTtG36rvLHBqhdvm7gVBscpTizO-0DSLPScw |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwED9EH9QHp1NxOrUPvgh2S9qkXR9FlPmxIVLBt5KPK4jSietA_OtN0nb4hfjUPlzaJJe73OUuvwM4SogSUhDpa6Fyn9GB9hOOsZ8Tab0RwSJhD_RH42h4z64e-MMCnMzvwiCiSz7Dnn11sXw9UTN7VNaPLXSMVbhL5sl4dVtrHjNgnFbY4NSIsPE6mqAkSfrpZWqzuHjPxhkDHnzZhFxVlR-q2O0vFy0YNT2r0kqeerNS9tT7N9DG_3Z9HdZqQ9M7rVbGBixg0YZWU8TBq2W6DaufEAk3gZ87SAnzLU8U2nPaQpXencuXra8pFVNvknu3ZlSPb4ZC43QL7i_O07OhX1dW8FWQ0NIPmKQozCLiUZ5rEqswEjpmMkEMqBooRaQxcxQxT6LRUBjJzUNKpRZcRjIOt2GxmBS4Y1OjCEbIdKAxYmbYIhD5QISaCFTGlQs60G-mOlM17LitfvGcOfeDJJlhTmaZk9XM6cDxvMVLBbnxB-2mnes5XT3NHeg23MxqmZxmNDbqbWAt1t3fWx3C8jAd3WQ3l-PrPVix_6myybqwWL7OcN-YH6U8cKvuA3g91cA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+and+Compact+Representations+of+Prefix+Codes&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Gagie%2C+T&rft.au=Navarro%2C+G&rft.au=Nekrich%2C+Y&rft.au=Ordonez%2C+A&rft.date=2015-09-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=61&rft.issue=9&rft.spage=4999&rft_id=info:doi/10.1109%2FTIT.2015.2452252&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=3788278731 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon |