AdaGL: Adaptive Learning for Agile Distributed Training of Gigantic GNNs

Distributed GNN training on contemporary massive and densely connected graphs requires information aggregation from all neighboring nodes, which leads to an explosion of inter-server communications. This paper proposes AdaGL, a highly scalable end-to-end framework for rapid distributed GNN training....

Full description

Saved in:
Bibliographic Details
Published in2023 60th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors Zhang, Ruisi, Javaheripi, Mojan, Ghodsi, Zahra, Bleiweiss, Amit, Koushanfar, Farinaz
Format Conference Proceeding
LanguageEnglish
Published IEEE 09.07.2023
Subjects
Online AccessGet full text
DOI10.1109/DAC56929.2023.10248003

Cover

Abstract Distributed GNN training on contemporary massive and densely connected graphs requires information aggregation from all neighboring nodes, which leads to an explosion of inter-server communications. This paper proposes AdaGL, a highly scalable end-to-end framework for rapid distributed GNN training. AdaGL novelty lies upon our adaptive-learning based graph-allocation engine as well as utilizing multi-resolution coarse representation of dense graphs. As a result, AdaGL achieves an unprecedented level of balanced server computation while minimizing the communication overhead. Extensive proof-of-concept evaluations on billion-scale graphs show AdaGL attains ∼30−40% faster convergence compared with prior arts.
AbstractList Distributed GNN training on contemporary massive and densely connected graphs requires information aggregation from all neighboring nodes, which leads to an explosion of inter-server communications. This paper proposes AdaGL, a highly scalable end-to-end framework for rapid distributed GNN training. AdaGL novelty lies upon our adaptive-learning based graph-allocation engine as well as utilizing multi-resolution coarse representation of dense graphs. As a result, AdaGL achieves an unprecedented level of balanced server computation while minimizing the communication overhead. Extensive proof-of-concept evaluations on billion-scale graphs show AdaGL attains ∼30−40% faster convergence compared with prior arts.
Author Koushanfar, Farinaz
Zhang, Ruisi
Ghodsi, Zahra
Bleiweiss, Amit
Javaheripi, Mojan
Author_xml – sequence: 1
  givenname: Ruisi
  surname: Zhang
  fullname: Zhang, Ruisi
  email: ruz032@ucsd.edu
  organization: University of California,San Diego
– sequence: 2
  givenname: Mojan
  surname: Javaheripi
  fullname: Javaheripi, Mojan
  email: mojan@ucsd.edu
  organization: University of California,San Diego
– sequence: 3
  givenname: Zahra
  surname: Ghodsi
  fullname: Ghodsi, Zahra
  email: zahra@purdue.edu
  organization: Purdue University
– sequence: 4
  givenname: Amit
  surname: Bleiweiss
  fullname: Bleiweiss, Amit
  email: amit.bleiweiss@intel.com
  organization: Intel Lab
– sequence: 5
  givenname: Farinaz
  surname: Koushanfar
  fullname: Koushanfar, Farinaz
  email: farinaz@ucsd.edu
  organization: University of California,San Diego
BookMark eNo1j81KxDAURiMoqGPfQCQv0HqTm5_GXZnRjlDGTfdDOr0pgbEd0ir49g7-rM7ig8N3btnlOI3E2IOAQghwj5tqrY2TrpAgsRAgVQmAFyxz1pWoASWqUlyzbJ5jBwZ0qcCoG7atel83T_yM0xI_iTfk0xjHgYcp8WqIR-KbOC8pdh8L9bxNPv7MU-B1HPy4xAOvd7v5jl0Ff5wp--OKtS_P7XqbN2_167pqci8dLHmwmqDTlhR0AlHL8zXhtZAheBeEVYhABh2R7zyZEnqrJFphO6OVP-CK3f9qIxHtTym--_S1_-_FbwkgS9I
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC56929.2023.10248003
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350323481
EndPage 6
ExternalDocumentID 10248003
Genre orig-research
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a290t-f75e0b57e40b133525031a512ffa9f174330e639eeabae680d7423717b654ac3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:50:59 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a290t-f75e0b57e40b133525031a512ffa9f174330e639eeabae680d7423717b654ac3
PageCount 6
ParticipantIDs ieee_primary_10248003
PublicationCentury 2000
PublicationDate 2023-July-9
PublicationDateYYYYMMDD 2023-07-09
PublicationDate_xml – month: 07
  year: 2023
  text: 2023-July-9
  day: 09
PublicationDecade 2020
PublicationTitle 2023 60th ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060584064
Score 2.227539
Snippet Distributed GNN training on contemporary massive and densely connected graphs requires information aggregation from all neighboring nodes, which leads to an...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Adaptive learning
Design automation
Explosions
Linear programming
Servers
Training
Title AdaGL: Adaptive Learning for Agile Distributed Training of Gigantic GNNs
URI https://ieeexplore.ieee.org/document/10248003
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA66kycVJ_4mB6-tWZukjbcx3YZo8TBht_GSvo4hbEO7i3-9L1mrKAieWgqh5b0k30vf-77H2DWAkVVpZJSlDiKp0iSCqoKIZhdFr9DTZaBHPxV6_CIfpmrakNUDFwYRQ_EZxv425PLLldv4X2W0whOZB23P3SzXW7JWO3l8eo_ASTYs4J4wN3f9gdIE_7FvER63g3-0UQkoMtxnRfv-bfHIa7ypbew-fkkz_vsDD1j3m7DHn7-g6JDt4PKIjfsljB5vOV3WflPjjZbqnFOgyvtz2g_4ndfN9S2vsOSTplsEX1V8tJiTyReOj4rivcsmw_vJYBw1jRMiSIyooypTKKzKUArb86QqRUsXCNrJC6byZ5BUIIUmiGABdS5Kn6-lg53VSoJLj1lnuVriCeMiAyllUMnxwm_WlJg4BUrn4ARgesq63gqz9VYaY9Ya4OyP5-dszzsj1LuaC9ap3zZ4Sahe26vgzU-kd57F
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA6iBz2pOPG3OXhNzdqkXb2Nza3qVjxU2G28Nq9jCNvQ7uJf70vWKgqCp5ZCILyX5Hvpe9_3GLsBiFVpYiWioAChdOALKEsQtLooeoV2aBw9epyGyYt6nOhJTVZ3XBhEdMVn6NlXl8s3y2Jtf5XRDvdVx2l77millN7QtZrlYxN8BE-q5gG3ZXzb7_Z0SAGAZ5uEe83wH41UHI4M9lnazGBTPvLqravcKz5-iTP-e4oHrPVN2ePPX2B0yLZwccSSroHh6I7TY2WPNV6rqc44haq8O6MTgfetcq5teoWGZ3W_CL4s-XA-I6PPCz5M0_cWywb3WS8RdesEAX4sK1FGGmWuI1Qyb1talabNCwTu5Ie4tLeQQCIFJ4iQA4YdaWzGlq52eagVFMEx214sF3jCuIyATO10cqz0Wx4b9AsNOuxAIQGDU9ayVpiuNuIY08YAZ398v2a7STYeTUcP6dM527OOcdWv8QXbrt7WeEkYX-VXzrOfKmmiEg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+60th+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=AdaGL%3A+Adaptive+Learning+for+Agile+Distributed+Training+of+Gigantic+GNNs&rft.au=Zhang%2C+Ruisi&rft.au=Javaheripi%2C+Mojan&rft.au=Ghodsi%2C+Zahra&rft.au=Bleiweiss%2C+Amit&rft.date=2023-07-09&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FDAC56929.2023.10248003&rft.externalDocID=10248003