DHC: Distributed Homomorphic Compression for Gradient Aggregation in Allreduce

Distributed training is critical for efficiently developing deep neural networks (DNNs) on tasks like image classification and natural language processing. However, as model and dataset sizes continue to grow, high communication overhead during gradient exchanges has become a major bottleneck in dis...

Full description

Saved in:

Bibliographic Details
Published in	IEEE International Conference on Communications (2003) pp. 1 - 6
Main Authors	Liao, Lida, Lin, Zhengli, Chen, Haodong, Zhu, Longlong, Liu, Hongyan, Yu, Jiashuo, Zhang, Dong, Wu, Chunming
Format	Conference Proceeding
Language	English
Published	IEEE 08.06.2025
Subjects	Computational modeling Image classification Image coding Indexing Integer linear programming Libraries Memory management Natural language processing Throughput Training
Online Access	Get full text
ISSN	1938-1883
DOI	10.1109/ICC52391.2025.11161970

Cover

More Information
Summary:	Distributed training is critical for efficiently developing deep neural networks (DNNs) on tasks like image classification and natural language processing. However, as model and dataset sizes continue to grow, high communication overhead during gradient exchanges has become a major bottleneck in distributed training. Although existing homomorphic compression frameworks effectively reduce communication overhead, their reliance on centralized architectures makes them unsuitable for the mainstream decentralized AllReduce architecture. To address this, we propose DHC, a framework for homomorphic gradient compression in AllReduce architectures. Its key idea is HG-Sketch, which leverages multi-level index tables for direct in-network aggregation of compressed gradients, thereby eliminating additional computational overhead. Additionally, DHC introduces an index-sharing method to optimize memory usage on programmable switches. Furthermore, we establish an Integer Linear Programming (ILP) model to optimize the deployment strategy of programmable switches, further enhancing in-network aggregation capabilities. Experimental results demonstrate that DHC achieves a 3.8 \times increase in aggregation speed and a 4.2 \times improvement in aggregation throughput.
ISSN:	1938-1883
DOI:	10.1109/ICC52391.2025.11161970