Optimizing Lattice Basis Reduction Algorithm on ARM V8 Processors

The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic superco...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 15; no. 4; p. 2021
Main Authors	Cao, Ronghui, Wang, Julong, Zheng, Liming, Zhou, Jincheng, Wang, Haodong, Xiao, Tiaojie, Gong, Chunye
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.02.2025
Subjects	Accuracy Algorithms ARM V8 Computational mathematics Cryptography High performance computing Integer programming lattice reduction Libraries Optimization techniques parallel optimization Supercomputers Tianhe China
Online Access	Get full text
ISSN	2076-3417 2076-3417
DOI	10.3390/app15042021

Cover

More Information
Summary:	The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance the efficiency of the LLL algorithm in practical applications, this research focuses on parallel optimization of the LLL_FP (LLL double-precision floating-point type) algorithm from the NTL library on the domestic Tianhe supercomputer using the Phytium ARM V8 processor. The optimization begins with the vectorization of the Gram–Schmidt coefficient calculation and row transformation using the SIMD instruction set of the Phytium chip, which significantly improve computational efficiency. Further assembly-level optimization fully utilizes the low-level instructions of the Phytium processor, and this increases execution speed. In terms of memory access, data prefetch techniques were then employed to load necessary data in advance before computation. This will reduce cache misses and accelerate data processing. To further enhance performance, loop unrolling was applied to the core loop, which allows more operations per loop iteration. Experimental results show that the optimized LLL_FP algorithm achieves up to a 42% performance improvement, with a minimum improvement of 34% and an average improvement of 38% in single-core efficiency compared to the serial LLL_FP algorithm. This study provides a more efficient solution for large-scale lattice basis reduction and demonstrates the potential of the LLL algorithm in ARM V8 high-performance computing environments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app15042021