Data compression using word encoding with Huffman code

A technique for compressing large databases is presented. The method replaces frequent variable‐length byte strings (words or word fragments) in the database by minimum‐redundancy codes—Huffman codes. An essential part of the technique is the construction of the dictionary to yield high compression...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Society for Information Science Vol. 42; no. 9; pp. 685 - 698
Main Authors Liu, Chengwen, Yu, Clement
Format Journal Article
LanguageEnglish
Published Washington, D.C Wiley Subscription Services, Inc., A Wiley Company 01.10.1991
John Wiley & Sons
American Documentation Institute
Wiley Periodicals Inc
Subjects
Online AccessGet full text
ISSN0002-8231
1097-4571
DOI10.1002/(SICI)1097-4571(199110)42:9<685::AID-ASI7>3.0.CO;2-1

Cover

More Information
Summary:A technique for compressing large databases is presented. The method replaces frequent variable‐length byte strings (words or word fragments) in the database by minimum‐redundancy codes—Huffman codes. An essential part of the technique is the construction of the dictionary to yield high compression ratios. A heuristic is used to count frequencies of word fragments. A detailed analysis is provided of our implementaton in support of high compression ratios and efficient encoding and decoding under the constraint of a fixed amount of main memory. In each phase of our implementation, we explain why certain data structures or techniques are employed. Experimental results show that our compression scheme is very effective for compressing large databases of library records. © 1991 John Wiley & Sons, Inc.
Bibliography:istex:B13659E52282E6EC4B86A3004BE990BFCDC8CD86
ark:/67375/WNG-R675QK46-N
ArticleID:ASI7
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Statistics/Data Report-1
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0002-8231
1097-4571
DOI:10.1002/(SICI)1097-4571(199110)42:9<685::AID-ASI7>3.0.CO;2-1