A Low-Complexity and High-Throughput Hardware Design for Lempel-Ziv 4 Compression Algorithm

The Lempel-Ziv (LZ) 4 compression algorithm, widely used in data transmission and storage, faces the challenge of high-speed implementation and increased complexity in the era of big data. Therefore, this paper proposes a single-core parallel architecture for LZ4 algorithm with high throughput and l...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems. I, Regular papers Vol. 72; no. 9; pp. 4901 - 4911
Main Authors	Chen, Tao, Song, Suwen, Wang, Zhongfeng
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Application specific integrated circuits ASIC Big Data Complexity Complexity theory Compression algorithms Compression ratio Computer architecture Data transmission Encoding Feedback loops Field programmable gate arrays FPGA Hardware hardware accelerator high throughput Logic Lossless data compression low complexity LZ4 algorithm Parallel architectures Parallel processing Signal processing algorithms Throughput
Online Access	Get full text
ISSN	1549-8328 1558-0806
DOI	10.1109/TCSI.2025.3530167

Cover

More Information
Summary:	The Lempel-Ziv (LZ) 4 compression algorithm, widely used in data transmission and storage, faces the challenge of high-speed implementation and increased complexity in the era of big data. Therefore, this paper proposes a single-core parallel architecture for LZ4 algorithm with high throughput and low complexity. Firstly, to enhance throughput, two innovative approaches are introduced from the perspective of parallelism and frequency with an acceptable compression ratio loss: each parallelization window is restricted to performing a single match, bridging the gap between actual and theoretical parallelism; the feedback loop in the circuit is broken by utilizing the spatial correlation between adjacent matches for higher frequency. Secondly, two optimization schemes are employed on resource-consuming modules to achieve low complexity. Multi-port hash tables using Live Value Table (LVT) are improved based on inherent data characteristics, significantly reducing the hardware resource consumption while ensuring excellent scalability on hash table depth and frequency. The match comparison operation is moved ahead, further reducing the logic resources by 64.36%. Finally, our design is implemented on FPGA and ASIC platforms. Experimental results on FPGA demonstrate that the proposed architecture achieves a throughput of 17.39 Gb/s, exhibiting a 2.86<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> improvement over the state-of-the-art, along with a 6.46<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> enhancement in area efficiency. Further optimizations including Canonic Signed Digit (CSD) coding and computational reuse on the ASIC platform result in a <inline-formula> <tex-math notation="LaTeX">45\times </tex-math></inline-formula> improvement in area efficiency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2025.3530167