CoCoA: Algorithm-Hardware Co-Design for Large-Scale GNN Training Using Compressed Graph

Scaling Graph Neural Network (GNN) training on large-scale graph data poses a critical challenge for implementing GNN applications on real-world giant graphs. The size of real-world graphs often exceeds the memory capacity of accelerator devices, necessitating the use of multiple devices or host mem...

Full description

Saved in:
Bibliographic Details
Published inDigest of technical papers - IEEE/ACM International Conference on Computer-Aided Design pp. 1 - 9
Main Authors Han, Yunki, Shin, Jaekang, Park, Gunhee, Kim, Lee-Sup
Format Conference Proceeding
LanguageEnglish
Published ACM 27.10.2024
Subjects
Online AccessGet full text
ISSN1558-2434
DOI10.1145/3676536.3676835

Cover

More Information
Summary:Scaling Graph Neural Network (GNN) training on large-scale graph data poses a critical challenge for implementing GNN applications on real-world giant graphs. The size of real-world graphs often exceeds the memory capacity of accelerator devices, necessitating the use of multiple devices or host memory for training. While expanding memory space alleviates the out-of-memory problem, this approach introduces another bottleneck through heavy communication via low-bandwidth interconnection. Therefore, achieving efficient, scalable GNN training on large-scale graphs requires addressing both capacity and communication issues. To overcome the limitations of previous methods, this work proposes an alternative approach: satisfying memory requirements by reducing the size of data used in training rather than expanding memory capacity. We introduce CoCoA, an end-to-end GNN training solution for large-scale graphs, achieved through a Co-designed algorithm and hardware optimized for Compressed graph inputs. In this work, we initially compress the input graph, considering both edges and node embeddings. The compression method is optimized using the homophily characteristics of GNN datasets based on node merging and vector quantization. After the compression step, the dedicated accelerator is designed and optimized for computing over a compressed graph, minimizing decompression overhead. The proposed architecture supports early aggregation methods to avoid redundant data loading and computation. As a result, CoCoA enables data compression for full-batch GNN training at 10 \times compression ratio with acceptable accuracy degradation. Our proposed CoCoA system, using a single device with 16GB memory, achieves up to a 351.2 \times speedup over the baseline system.
ISSN:1558-2434
DOI:10.1145/3676536.3676835