DHC: Distributed Homomorphic Compression for Gradient Aggregation in Allreduce

Distributed training is critical for efficiently developing deep neural networks (DNNs) on tasks like image classification and natural language processing. However, as model and dataset sizes continue to grow, high communication overhead during gradient exchanges has become a major bottleneck in dis...

Full description

Saved in:
Bibliographic Details
Published inIEEE International Conference on Communications (2003) pp. 1 - 6
Main Authors Liao, Lida, Lin, Zhengli, Chen, Haodong, Zhu, Longlong, Liu, Hongyan, Yu, Jiashuo, Zhang, Dong, Wu, Chunming
Format Conference Proceeding
LanguageEnglish
Published IEEE 08.06.2025
Subjects
Online AccessGet full text
ISSN1938-1883
DOI10.1109/ICC52391.2025.11161970

Cover

More Information
Summary:Distributed training is critical for efficiently developing deep neural networks (DNNs) on tasks like image classification and natural language processing. However, as model and dataset sizes continue to grow, high communication overhead during gradient exchanges has become a major bottleneck in distributed training. Although existing homomorphic compression frameworks effectively reduce communication overhead, their reliance on centralized architectures makes them unsuitable for the mainstream decentralized AllReduce architecture. To address this, we propose DHC, a framework for homomorphic gradient compression in AllReduce architectures. Its key idea is HG-Sketch, which leverages multi-level index tables for direct in-network aggregation of compressed gradients, thereby eliminating additional computational overhead. Additionally, DHC introduces an index-sharing method to optimize memory usage on programmable switches. Furthermore, we establish an Integer Linear Programming (ILP) model to optimize the deployment strategy of programmable switches, further enhancing in-network aggregation capabilities. Experimental results demonstrate that DHC achieves a 3.8 \times increase in aggregation speed and a 4.2 \times improvement in aggregation throughput.
ISSN:1938-1883
DOI:10.1109/ICC52391.2025.11161970