Efficient 3-D Processor Array Reconfiguration Algorithms Based on Bucket Effect

With the progressive augmentation of the density of 3-D processor arrays, some processor elements (PEs) often fail due to overload or overheating during massively parallel computing operations. Therefore, it is necessary to take effective fault-tolerant technology to ensure the reliability of the sy...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 43; no. 4; pp. 1023 - 1036
Main Authors Ding, Hao, He, Yanlong, Zhai, Zhongyi, Li, Zhi, Qian, Junyan, Zhao, Lingzhong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.04.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0278-0070
1937-4151
DOI10.1109/TCAD.2023.3337196

Cover

More Information
Summary:With the progressive augmentation of the density of 3-D processor arrays, some processor elements (PEs) often fail due to overload or overheating during massively parallel computing operations. Therefore, it is necessary to take effective fault-tolerant technology to ensure the reliability of the system. This article investigates an efficient reconfiguration method to construct 3-D fault-free logical subarray with more fault-free PEs and less interconnection length (interlength). First, we propose a novel method based on the barrel effect to find the bottleneck plane of 3-D processor arrays. Second, an efficient compensation strategy is proposed to replace faulty PEs on adjacent physical planes with fault-free PEs on the bottleneck planes, which leads to more fault-free PEs that can be used to construct the subarray. Then, we propose a heuristic to construct the subarray and optimize iteration redundancy to accelerate reconstruction. Finally, a heuristic optimization algorithm is proposed to reduce the interlength between PEs, which can reduce the dynamic power consumption and communication costs. In addition, we propose a more accurate method to calculate the lower bound of the interlength to better evaluate the performance of the algorithm. Simulation experiments show that, compared to the state-of-the-arts, on <inline-formula> <tex-math notation="LaTeX">128\times 128\times 128 </tex-math></inline-formula> host array, the utilization rate of fault-free PEs can be improved up to 15.6% and the interlength redundancy can be reduced by 78.2% for random faults. On <inline-formula> <tex-math notation="LaTeX">64\times 64\times 64 </tex-math></inline-formula> host array, the average improvement of the two indicators under clustered faults can reach 93.2% and 69.3%. Moreover, for all cases considered, the proposed new lower bound and reconstruction time can be reduced by an average of 18.47% and 76.13%, respectively.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2023.3337196