Adaptive system-level fault diagnosis of hierarchical cubic networks Adaptive System-Level Fault Diagnosis of Hierarchical Cubic Networks

As the demand for computational power continues to rise, large-scale hierarchical networks play an increasingly pivotal role in high-performance computing (HPC) systems. Fault diagnosis is essential for ensuring the stability and reliability of these complex networks. However, traditional fault diag...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 81; no. 15; p. 1409
Main Authors Lv, Mengjie, Di, Sixiao, Fan, Weibei
Format Journal Article
LanguageEnglish
Published New York Springer US 05.10.2025
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1573-0484
0920-8542
1573-0484
DOI10.1007/s11227-025-07897-1

Cover

More Information
Summary:As the demand for computational power continues to rise, large-scale hierarchical networks play an increasingly pivotal role in high-performance computing (HPC) systems. Fault diagnosis is essential for ensuring the stability and reliability of these complex networks. However, traditional fault diagnosis methods face significant challenges in scalability and real-time performance as system sizes and complexities grow. This paper introduces a parallel adaptive system-level fault diagnosis algorithm (PAD-HCN) that leverages the Hamiltonian structure to efficiently and accurately diagnose faults in large-scale hierarchical cubic networks (HCNs). Specially, we first prove that HCNs are Hamiltonian, a property that forms the theoretical foundation for designing an efficient fault diagnosis mechanism. Building upon this property, we propose a parallel and adaptive fault diagnosis algorithm that significantly enhances diagnostic performance. Simulation results demonstrate that the proposed algorithm achieves exceptional diagnostic accuracy, maintaining nearly 100 % accuracy when the number of faulty vertices remains within the fault-tolerance threshold. Furthermore, the parallelization of the algorithm leads to a substantial reduction in diagnosis time, with the parallel approach achieving up to a 99.97 % reduction in time compared to traditional non-parallel methods. These results underscore the scalability of PAD-HCN, demonstrating its ability to efficiently handle the increasing size and complexity of modern HPC systems while preserving high fault diagnosis accuracy and real-time responsiveness.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1573-0484
0920-8542
1573-0484
DOI:10.1007/s11227-025-07897-1