Adaptive system-level fault diagnosis of hierarchical cubic networks Adaptive System-Level Fault Diagnosis of Hierarchical Cubic Networks
As the demand for computational power continues to rise, large-scale hierarchical networks play an increasingly pivotal role in high-performance computing (HPC) systems. Fault diagnosis is essential for ensuring the stability and reliability of these complex networks. However, traditional fault diag...
Saved in:
| Published in | The Journal of supercomputing Vol. 81; no. 15; p. 1409 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
05.10.2025
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1573-0484 0920-8542 1573-0484 |
| DOI | 10.1007/s11227-025-07897-1 |
Cover
| Summary: | As the demand for computational power continues to rise, large-scale hierarchical networks play an increasingly pivotal role in high-performance computing (HPC) systems. Fault diagnosis is essential for ensuring the stability and reliability of these complex networks. However, traditional fault diagnosis methods face significant challenges in scalability and real-time performance as system sizes and complexities grow. This paper introduces a parallel adaptive system-level fault diagnosis algorithm (PAD-HCN) that leverages the Hamiltonian structure to efficiently and accurately diagnose faults in large-scale hierarchical cubic networks (HCNs). Specially, we first prove that HCNs are Hamiltonian, a property that forms the theoretical foundation for designing an efficient fault diagnosis mechanism. Building upon this property, we propose a parallel and adaptive fault diagnosis algorithm that significantly enhances diagnostic performance. Simulation results demonstrate that the proposed algorithm achieves exceptional diagnostic accuracy, maintaining nearly
100
%
accuracy when the number of faulty vertices remains within the fault-tolerance threshold. Furthermore, the parallelization of the algorithm leads to a substantial reduction in diagnosis time, with the parallel approach achieving up to a
99.97
%
reduction in time compared to traditional non-parallel methods. These results underscore the scalability of PAD-HCN, demonstrating its ability to efficiently handle the increasing size and complexity of modern HPC systems while preserving high fault diagnosis accuracy and real-time responsiveness. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1573-0484 0920-8542 1573-0484 |
| DOI: | 10.1007/s11227-025-07897-1 |