Efficient and Compact Representations of Prefix Codes

Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and co...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information theory Vol. 61; no. 9; pp. 4999 - 5011
Main Authors Gagie, Travis, Navarro, Gonzalo, Nekrich, Yakov, Ordonez, Alberto
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2015
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0018-9448
1557-9654
DOI10.1109/TIT.2015.2452252

Cover

Abstract Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then, a naive storage of an optimal prefix code uses O(n log n) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any 0 <; ε <; 1/2, takes O(n log log(1/E)) bits to store a prefix code with an average codeword length within an additive ε of the minimum. Finally, a second approximation takes, for any constant c > 1, O(n 1/c log n) bits to store a prefix code with an average codeword length at most c times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1) time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented.
AbstractList Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then, a naive storage of an optimal prefix code uses O(n log n) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any 0 <; ε <; 1/2, takes O(n log log(1/E)) bits to store a prefix code with an average codeword length within an additive ε of the minimum. Finally, a second approximation takes, for any constant c > 1, O(n 1/c log n) bits to store a prefix code with an average codeword length at most c times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1) time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented.
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let ... be the sequence length and ... be the alphabet size. Then, a naive storage of an optimal prefix code uses ... bits. Our first technique shows how to use ... bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any ..., takes ... bits to store a prefix code with an average codeword length within an additive ... of the minimum. Finally, a second approximation takes, for any constant ..., ... bits to store a prefix code with an average codeword length at most ... times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in ... time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented. (ProQuest: ... denotes formulae/symbols omitted.)
Author Nekrich, Yakov
Gagie, Travis
Ordonez, Alberto
Navarro, Gonzalo
Author_xml – sequence: 1
  givenname: Travis
  surname: Gagie
  fullname: Gagie, Travis
  email: travis.gagie@gmail.com
  organization: Dept. of Comput. Sci., Univ. of Helsinki, Helsinki, Finland
– sequence: 2
  givenname: Gonzalo
  surname: Navarro
  fullname: Navarro, Gonzalo
  email: gnavarro@dcc.uchile.cl
  organization: Dept. of Comput. Sci., Univ. of Chile, Santiago, Chile
– sequence: 3
  givenname: Yakov
  surname: Nekrich
  fullname: Nekrich, Yakov
  email: yakov.nekrich@googlemail.com
  organization: David R. Cheriton Sch. of Comput. Sci., Univ. of Waterloo, Waterloo, ON, Canada
– sequence: 4
  givenname: Alberto
  surname: Ordonez
  fullname: Ordonez, Alberto
  email: alberto.ordonez@udc.es
  organization: Database Lab., Univ. da Coruna, Coruña, Spain
BookMark eNp9kEFLAzEQhYNUsK3eBS8Lnrdmsslm9yilaqGgSD2HbDKBlHZ3Tbag_96ULR48eHrM8L2Zx5uRSdu1SMgt0AUArR-26-2CURALxgVjgl2QKQgh87oUfEKmlEKV15xXV2QW4y6NXACbErFyzhuP7ZDp1mbL7tBrM2Tv2AeMaasH37Ux61z2FtD5r0RYjNfk0ul9xJuzzsnH02q7fMk3r8_r5eMmN6yGIWe8AdSorSids1SaotRW8qZGZGAqY2gDojY0KbWYCAHUFQCN1aIpG1nMyf14tw_d5xHjoHbdMbTppQJJJa0E1EWiypEyoYsxxVTGj8GHoP1eAVWnilSqSJ0qUueKkpH-MfbBH3T4_s9yN1o8Iv7iEgTnJSt-AMcjcyM
CODEN IETTAW
CitedBy_id crossref_primary_10_1145_3342555
crossref_primary_10_1145_3397175
crossref_primary_10_1016_j_tcs_2022_01_010
crossref_primary_10_3390_s21186236
crossref_primary_10_1016_j_ipl_2022_106274
Cites_doi 10.1137/S0097539799364092
10.1145/1082036.1082043
10.1002/j.1538-7305.1959.tb01583.x
10.1109/18.86980
10.1109/JRPROC.1952.273898
10.1145/253495.342777
10.1016/j.ipl.2005.10.006
10.1006/jcss.2001.1779
10.1016/j.tcs.2013.10.019
10.1016/j.ipl.2006.04.008
10.1007/BF01683268
10.1007/978-3-642-34109-0_39
10.1145/1412228.1412230
10.1007/s00453-001-0060-4
10.1002/spe.4380190207
10.1016/j.ipl.2008.07.004
10.1145/348751.348754
10.1145/5684.5688
10.1109/26.634683
10.1007/978-3-642-24583-1_18
10.1016/0022-0000(93)90040-4
10.1145/79147.79150
10.1109/TIT.1976.1055554
10.1016/0020-0190(93)90207-P
10.1109/26.469442
10.1109/26.843129
10.1016/0020-0190(88)90146-9
10.1145/363958.363991
10.1137/1.9781611972900.9
10.1007/978-3-642-32241-9_34
10.1007/s10791-006-9001-9
10.1109/DCC.2013.46
10.1007/978-3-642-03367-4_28
10.1145/828.1884
10.1007/978-3-642-11266-9_35
10.1016/j.jda.2013.07.004
10.1002/spe.741
10.1109/DCC.2003.1194057
10.1145/1841909.1841913
10.1109/DCC.1992.227470
10.1145/1240233.1240243
10.1007/s00453-007-9140-4
10.1137/0121057
10.1007/978-3-642-33090-2_17
10.1007/s10791-012-9184-1
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2015
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2015
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TIT.2015.2452252
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1557-9654
EndPage 5011
ExternalDocumentID 3788278731
10_1109_TIT_2015_2452252
7154462
Genre orig-research
Feature
GrantInformation_xml – fundername: ICT COST Action
  grantid: IC1302
– fundername: Millennium Nucleus Information and Coordination in Networks
  grantid: ICM/FIC RC130003
– fundername: MINECO
  grantid: CDTI-00064563/ITC-20133062
– fundername: FPU Program
– fundername: Xunta de Galicia co-founded with FEDER
  grantid: GRC2013/053; Grant AP2010-6038
– fundername: CDTI
– fundername: AGI
– fundername: MINECO through PGE and FEDER
  grantid: TIN2013-46238-C4-3-R; TIN2013-47090- C3-3-P
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
AENEX
AETEA
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
VH1
VJK
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c291t-24b1eaead56ffd07c36ad74b9ee21c8cc0b159c0c0b0defd0510f311bda5b6b73
IEDL.DBID RIE
ISSN 0018-9448
IngestDate Sun Oct 05 00:24:15 EDT 2025
Thu Apr 24 22:56:25 EDT 2025
Wed Oct 01 02:55:10 EDT 2025
Tue Aug 26 16:50:03 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords data compression
Computers and information processing
huffman coding
data systems
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-24b1eaead56ffd07c36ad74b9ee21c8cc0b159c0c0b0defd0510f311bda5b6b73
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
PQID 1707085193
PQPubID 36024
PageCount 13
ParticipantIDs proquest_journals_1707085193
crossref_citationtrail_10_1109_TIT_2015_2452252
crossref_primary_10_1109_TIT_2015_2452252
ieee_primary_7154462
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2015-Sept.
2015-9-00
20150901
PublicationDateYYYYMMDD 2015-09-01
PublicationDate_xml – month: 09
  year: 2015
  text: 2015-Sept.
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on information theory
PublicationTitleAbbrev TIT
PublicationYear 2015
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref11
ref10
grossi (ref25) 2003
ref17
ref16
ref19
ref18
ref51
ref50
barbay (ref3) 2009
knuth (ref32) 1973; 3
ref48
ref47
ref42
ref41
ref44
ref43
ref49
navarro (ref45) 2012; 7276
ref8
bowe (ref7) 2010
ref9
ref4
ref6
ref5
p?tra?cu (ref46) 2008
witten (ref52) 1999
ref40
ref35
ref34
ref37
ref36
ref31
ref30
ref33
ref2
ref1
ref39
ref38
ref24
ref23
ref26
ref20
ref22
ref21
ref28
ref27
ref29
References_xml – ident: ref40
  doi: 10.1137/S0097539799364092
– ident: ref15
  doi: 10.1145/1082036.1082043
– ident: ref24
  doi: 10.1002/j.1538-7305.1959.tb01583.x
– ident: ref41
  doi: 10.1109/18.86980
– ident: ref28
  doi: 10.1109/JRPROC.1952.273898
– ident: ref12
  doi: 10.1145/253495.342777
– ident: ref19
  doi: 10.1016/j.ipl.2005.10.006
– ident: ref1
  doi: 10.1006/jcss.2001.1779
– ident: ref4
  doi: 10.1016/j.tcs.2013.10.019
– ident: ref20
  doi: 10.1016/j.ipl.2006.04.008
– ident: ref51
  doi: 10.1007/BF01683268
– ident: ref10
  doi: 10.1007/978-3-642-34109-0_39
– ident: ref42
  doi: 10.1145/1412228.1412230
– ident: ref35
  doi: 10.1007/s00453-001-0060-4
– ident: ref37
  doi: 10.1002/spe.4380190207
– ident: ref21
  doi: 10.1016/j.ipl.2008.07.004
– ident: ref39
  doi: 10.1145/348751.348754
– ident: ref6
  doi: 10.1145/5684.5688
– ident: ref38
  doi: 10.1109/26.634683
– ident: ref29
  doi: 10.1007/978-3-642-24583-1_18
– ident: ref18
  doi: 10.1016/0022-0000(93)90040-4
– start-page: 841
  year: 2003
  ident: ref25
  article-title: High-order entropy-compressed text indexes
  publication-title: Proc 14th Annu ACM-SIAM Symp Discrete Algorithms (SODA)
– year: 2010
  ident: ref7
  article-title: Multiary wavelet trees in practice
– ident: ref33
  doi: 10.1145/79147.79150
– ident: ref31
  doi: 10.1109/TIT.1976.1055554
– ident: ref11
  doi: 10.1016/0020-0190(93)90207-P
– ident: ref26
  doi: 10.1109/26.469442
– start-page: 111
  year: 2009
  ident: ref3
  article-title: Compressed representations of permutations, and applications
  publication-title: Proc 26th Int Symp Theoretical Aspects Comput Sci (STACS)
– ident: ref50
  doi: 10.1109/26.843129
– ident: ref49
  doi: 10.1016/0020-0190(88)90146-9
– volume: 7276
  start-page: 295
  year: 2012
  ident: ref45
  article-title: Fast, small, simple rank/select on bitmaps
  publication-title: Proc 11th Int Symp Experim Algorithms (SEA)
– ident: ref47
  doi: 10.1145/363958.363991
– ident: ref2
  doi: 10.1137/1.9781611972900.9
– ident: ref14
  doi: 10.1007/978-3-642-32241-9_34
– year: 1999
  ident: ref52
  publication-title: Managing Gigabytes Compressing and Indexing Documents and Images
– ident: ref9
  doi: 10.1007/s10791-006-9001-9
– ident: ref44
  doi: 10.1109/DCC.2013.46
– ident: ref23
  doi: 10.1007/978-3-642-03367-4_28
– ident: ref17
  doi: 10.1145/828.1884
– ident: ref22
  doi: 10.1007/978-3-642-11266-9_35
– ident: ref43
  doi: 10.1016/j.jda.2013.07.004
– ident: ref34
  doi: 10.1002/spe.741
– volume: 3
  year: 1973
  ident: ref32
  publication-title: The Art of Computer Programming Sorting and Searching
– ident: ref36
  doi: 10.1109/DCC.2003.1194057
– ident: ref13
  doi: 10.1145/1841909.1841913
– year: 2008
  ident: ref46
  article-title: Time-space trade-offs for predecessor search
– ident: ref48
  doi: 10.1109/DCC.1992.227470
– ident: ref16
  doi: 10.1145/1240233.1240243
– ident: ref30
  doi: 10.1007/s00453-007-9140-4
– ident: ref27
  doi: 10.1137/0121057
– ident: ref5
  doi: 10.1007/978-3-642-33090-2_17
– ident: ref8
  doi: 10.1007/s10791-012-9184-1
SSID ssj0014512
Score 2.2185256
Snippet Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes....
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 4999
SubjectTerms Additives
Algorithms
Approximation
Approximation methods
Arrays
Codes
Cryptography
Decoding
Efficiency
Encoding
Heuristic
Random access memory
Vegetation
Title Efficient and Compact Representations of Prefix Codes
URI https://ieeexplore.ieee.org/document/7154462
https://www.proquest.com/docview/1707085193
Volume 61
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 1557-9654
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014512
  issn: 0018-9448
  databaseCode: RIE
  dateStart: 19630101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA9zT_rgdFOcTumDL4Ltki7px6PIxhQmIhvsreTjCqJ04joQ_3qTNC1-IT61D5eQ5HKXu9zldwidp1hywbHwFZe5T0mi_JRB7OdYGG-E04ibC_3ZXTRd0NslW7bQZfMWBgBs8hkE5tfG8tVKbsxV2TA20DFG4W7FSVS91WoiBpSRChmcaAHWPkcdksTpcH4zNzlcLDBRxpCFX44gW1PlhyK2p8ukg2b1uKqkkqdgU4pAvn-DbPzvwPfQrjMzvatqX-yjFhRd1KlLOHhOorto5xMeYQ-xsQWU0H15vFCe1RWy9B5stqx7pFSsvVXu3etZPb5pCgXrA7SYjOfXU9_VVfBlmJLSD6kgwPUWYlGeKxzLUcRVTEUKEBKZSImFNnIk1l-sQFNouc1HhAjFmYhEPDpE7WJVwJFJjMIQAVWhgojqafOQ5wkfKcxBakcu7KNhvdSZdKDjpvbFc2adD5xmmjmZYU7mmNNHF02Llwpw4w_anlnrhs4tcx8Nam5mTiLXGYm1ckuMvXr8e6sTtG36rvLHBqhdvm7gVBscpTizO-0DSLPScw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwED9EH9QHp1NxOrUPvgh2S9qkXR9FlPmxIVLBt5KPK4jSietA_OtN0nb4hfjUPlzaJJe73OUuvwM4SogSUhDpa6Fyn9GB9hOOsZ8Tab0RwSJhD_RH42h4z64e-MMCnMzvwiCiSz7Dnn11sXw9UTN7VNaPLXSMVbhL5sl4dVtrHjNgnFbY4NSIsPE6mqAkSfrpZWqzuHjPxhkDHnzZhFxVlR-q2O0vFy0YNT2r0kqeerNS9tT7N9DG_3Z9HdZqQ9M7rVbGBixg0YZWU8TBq2W6DaufEAk3gZ87SAnzLU8U2nPaQpXencuXra8pFVNvknu3ZlSPb4ZC43QL7i_O07OhX1dW8FWQ0NIPmKQozCLiUZ5rEqswEjpmMkEMqBooRaQxcxQxT6LRUBjJzUNKpRZcRjIOt2GxmBS4Y1OjCEbIdKAxYmbYIhD5QISaCFTGlQs60G-mOlM17LitfvGcOfeDJJlhTmaZk9XM6cDxvMVLBbnxB-2mnes5XT3NHeg23MxqmZxmNDbqbWAt1t3fWx3C8jAd3WQ3l-PrPVix_6myybqwWL7OcN-YH6U8cKvuA3g91cA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+and+Compact+Representations+of+Prefix+Codes&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Gagie%2C+T&rft.au=Navarro%2C+G&rft.au=Nekrich%2C+Y&rft.au=Ordonez%2C+A&rft.date=2015-09-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=61&rft.issue=9&rft.spage=4999&rft_id=info:doi/10.1109%2FTIT.2015.2452252&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=3788278731
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon