A Low-Complexity and High-Throughput Hardware Design for Lempel-Ziv 4 Compression Algorithm

The Lempel-Ziv (LZ) 4 compression algorithm, widely used in data transmission and storage, faces the challenge of high-speed implementation and increased complexity in the era of big data. Therefore, this paper proposes a single-core parallel architecture for LZ4 algorithm with high throughput and l...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems. I, Regular papers Vol. 72; no. 9; pp. 4901 - 4911
Main Authors Chen, Tao, Song, Suwen, Wang, Zhongfeng
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1549-8328
1558-0806
DOI10.1109/TCSI.2025.3530167

Cover

More Information
Summary:The Lempel-Ziv (LZ) 4 compression algorithm, widely used in data transmission and storage, faces the challenge of high-speed implementation and increased complexity in the era of big data. Therefore, this paper proposes a single-core parallel architecture for LZ4 algorithm with high throughput and low complexity. Firstly, to enhance throughput, two innovative approaches are introduced from the perspective of parallelism and frequency with an acceptable compression ratio loss: each parallelization window is restricted to performing a single match, bridging the gap between actual and theoretical parallelism; the feedback loop in the circuit is broken by utilizing the spatial correlation between adjacent matches for higher frequency. Secondly, two optimization schemes are employed on resource-consuming modules to achieve low complexity. Multi-port hash tables using Live Value Table (LVT) are improved based on inherent data characteristics, significantly reducing the hardware resource consumption while ensuring excellent scalability on hash table depth and frequency. The match comparison operation is moved ahead, further reducing the logic resources by 64.36%. Finally, our design is implemented on FPGA and ASIC platforms. Experimental results on FPGA demonstrate that the proposed architecture achieves a throughput of 17.39 Gb/s, exhibiting a 2.86<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> improvement over the state-of-the-art, along with a 6.46<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> enhancement in area efficiency. Further optimizations including Canonic Signed Digit (CSD) coding and computational reuse on the ASIC platform result in a <inline-formula> <tex-math notation="LaTeX">45\times </tex-math></inline-formula> improvement in area efficiency.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2025.3530167