DHC: Distributed Homomorphic Compression for Gradient Aggregation in Allreduce
Distributed training is critical for efficiently developing deep neural networks (DNNs) on tasks like image classification and natural language processing. However, as model and dataset sizes continue to grow, high communication overhead during gradient exchanges has become a major bottleneck in dis...
        Saved in:
      
    
          | Published in | IEEE International Conference on Communications (2003) pp. 1 - 6 | 
|---|---|
| Main Authors | , , , , , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        08.06.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1938-1883 | 
| DOI | 10.1109/ICC52391.2025.11161970 | 
Cover
| Summary: | Distributed training is critical for efficiently developing deep neural networks (DNNs) on tasks like image classification and natural language processing. However, as model and dataset sizes continue to grow, high communication overhead during gradient exchanges has become a major bottleneck in distributed training. Although existing homomorphic compression frameworks effectively reduce communication overhead, their reliance on centralized architectures makes them unsuitable for the mainstream decentralized AllReduce architecture. To address this, we propose DHC, a framework for homomorphic gradient compression in AllReduce architectures. Its key idea is HG-Sketch, which leverages multi-level index tables for direct in-network aggregation of compressed gradients, thereby eliminating additional computational overhead. Additionally, DHC introduces an index-sharing method to optimize memory usage on programmable switches. Furthermore, we establish an Integer Linear Programming (ILP) model to optimize the deployment strategy of programmable switches, further enhancing in-network aggregation capabilities. Experimental results demonstrate that DHC achieves a 3.8 \times increase in aggregation speed and a 4.2 \times improvement in aggregation throughput. | 
|---|---|
| ISSN: | 1938-1883 | 
| DOI: | 10.1109/ICC52391.2025.11161970 |