Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms

Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor in...

Full description

Saved in:
Bibliographic Details
Published inProceedings - International Conference on High Performance Computing pp. 282 - 291
Main Authors Yik, Jason, Kuppannagari, Sanmukh R., Zeng, Hanqing, Prasanna, Viktor K.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2022
Subjects
Online AccessGet full text
ISSN2640-0316
DOI10.1109/HiPC56025.2022.00045

Cover

Abstract Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor information, GNN inferences require raw data from many times more nodes than are targeted for prediction. Thus, on heterogeneous compute platforms, inference latency can be largely subject to the inter-device communication cost of transferring input feature data to the GPU/accelerator before computation has even begun. In this paper, we analyze the trade-off effect of pruning input features from GNN models, reducing the volume of raw data that the model works with to lower communication latency at the expense of an expected decrease in the overall model accuracy. We develop greedy and regression-based algorithms to determine which features to retain for optimal prediction accuracy. We evaluate pruned model variants and find that they can reduce inference latency by up to 80% with an accuracy loss of less than 5% compared to non-pruned models. Furthermore, we show that the latency reductions from input feature pruning can be extended under different system variables such as batch size and floating point precision.
AbstractList Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor information, GNN inferences require raw data from many times more nodes than are targeted for prediction. Thus, on heterogeneous compute platforms, inference latency can be largely subject to the inter-device communication cost of transferring input feature data to the GPU/accelerator before computation has even begun. In this paper, we analyze the trade-off effect of pruning input features from GNN models, reducing the volume of raw data that the model works with to lower communication latency at the expense of an expected decrease in the overall model accuracy. We develop greedy and regression-based algorithms to determine which features to retain for optimal prediction accuracy. We evaluate pruned model variants and find that they can reduce inference latency by up to 80% with an accuracy loss of less than 5% compared to non-pruned models. Furthermore, we show that the latency reductions from input feature pruning can be extended under different system variables such as batch size and floating point precision.
Author Kuppannagari, Sanmukh R.
Yik, Jason
Zeng, Hanqing
Prasanna, Viktor K.
Author_xml – sequence: 1
  givenname: Jason
  surname: Yik
  fullname: Yik, Jason
  email: jyik@g.harvard.edu
  organization: Harvard University
– sequence: 2
  givenname: Sanmukh R.
  surname: Kuppannagari
  fullname: Kuppannagari, Sanmukh R.
  email: sanmukh.kuppannagari@case.edu
  organization: Case Western Reserve University
– sequence: 3
  givenname: Hanqing
  surname: Zeng
  fullname: Zeng, Hanqing
  email: zengh@meta.com
  organization: Meta AI
– sequence: 4
  givenname: Viktor K.
  surname: Prasanna
  fullname: Prasanna, Viktor K.
  email: prasanna@usc.edu
  organization: University of Southern California
BookMark eNotj8Fqg0AURaelhSZp_iCL-QHtm-fM6CyDNIkQEhftOozjM1jMGEZd9O9raVcHLtwDZ8mefO-JsY2AWAgwb4e2zJUGVDECYgwAUj2wtUkzobWSRmKiH9kCtYQIEqFf2HIYvgAQBKoFOxf-Po18R3acAvEyTL71V970gW-do46CHX-H_enEC99QIO-I954faKTQX8lTPw287Ow4f27DK3tubDfQ-p8r9rl7_8gP0fG8L_LtMWoR5BhZidDUNc50TeoqkwmrgEC4Sqa1hMxWVhsFWoCjDOcMRypxqnKVJgOUrNjmz9sS0eUe2psN3xcBAnQyJ_8AAllRuQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HiPC56025.2022.00045
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781665494236
1665494239
EISSN 2640-0316
EndPage 291
ExternalDocumentID 10106342
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 29H
29O
6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i204t-a420fdd2a42cf7cb981a50e01cb47d408aba6950610ce82494ce53c5bcb6e90e3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:21:30 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-a420fdd2a42cf7cb981a50e01cb47d408aba6950610ce82494ce53c5bcb6e90e3
PageCount 10
ParticipantIDs ieee_primary_10106342
PublicationCentury 2000
PublicationDate 2022-Dec.
PublicationDateYYYYMMDD 2022-12-01
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-Dec.
PublicationDecade 2020
PublicationTitle Proceedings - International Conference on High Performance Computing
PublicationTitleAbbrev HIPC
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020125
Score 2.2267404
Snippet Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce...
SourceID ieee
SourceType Publisher
StartPage 282
SubjectTerms accuracy/performance trade-off
Analytical models
Computational modeling
data science algorithms
graph neural network
High performance computing
input feature pruning
Machine learning
Prediction algorithms
Predictive models
Solid modeling
Title Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms
URI https://ieeexplore.ieee.org/document/10106342
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELWgJ05lKWKXD1xTnMRO4iOqKCkSoQcq9VZ5mUgVKEFtcuHrGWcpAgmJUyJLWeSx_ebZ82YIuUXEMIhciad8bTyEgNyTCKSeAm5dJlSbS6d3fs6idMGflmLZidUbLQwANMFnMHa3zVm-LU3ttspwhiOBCTmuuPtxErVirR27wpVWdNo4n8m7dD2fIJoHAjlg0CblFD8qqDQAMh2SrP90GzfyNq4rPTafv7Iy_vvfDsnoW6tH5zsUOiJ7UByTYV-sgXZz94S8zApsoc7lqzfukdrtiFD0Wem9MQg-bihgw2OW0dnuzWVBUxcwU-I4g7Le0vm7qpyfux2RxfThdZJ6XTUFbx0wXnmKByy3NsCryWOjZeIrwYD5RvPYcpYorSIpEN-ZgQRZGTcgQiO00RFIBuEpGRRlAWeEamQ91g9UnEuJ_BOSOAc_zLU2HE0fROdk5Dpo9dEmzFj1fXPxR_slOXBGaqNErsig2tRwjVhf6ZvGxl-Toqlb
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELYQDDCVRxFvPLCmOImdxCOqKCm0oUMrdats5yJVoAS1ycKv55ykRSAhMSWylId8tr_77PvuCLlDxDCIXJGjXG0chIDMkQikjgKe2kyoaSat3nmcBPGMP8_FvBWr11oYAKiDz6Bnb-uz_LQwld0qwxmOBMbnuOLuCc65aORaW36Fa61o1XEuk_fxctJHPPcEskCvScspftRQqSFk0CHJ5uNN5Mhbryp1z3z-ysv47787JN1vtR6dbHHoiOxAfkw6m3INtJ29J-R1mGMLtU5ftbKPVHZPhKLXSh-MQfixgwEbnpKEDrdvLnIa25CZAkcaFNWaTt5VaT3ddZfMBo_Tfuy09RScpcd46SjusSxNPbyaLDRaRq4SDJhrNA9TziKlVSAFIjwzECEv4waEb4Q2OgDJwD8lu3mRwxmhGnlP6noqzKREBgpRmIHrZ1obtIvxgnPStR20-GhSZiw2fXPxR_st2Y-n49FiNExeLsmBNVgTM3JFdstVBdeI_KW-qe39BaFWrKg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+High+Performance+Computing&rft.atitle=Input+Feature+Pruning+for+Accelerating+GNN+Inference+on+Heterogeneous+Platforms&rft.au=Yik%2C+Jason&rft.au=Kuppannagari%2C+Sanmukh+R.&rft.au=Zeng%2C+Hanqing&rft.au=Prasanna%2C+Viktor+K.&rft.date=2022-12-01&rft.pub=IEEE&rft.eissn=2640-0316&rft.spage=282&rft.epage=291&rft_id=info:doi/10.1109%2FHiPC56025.2022.00045&rft.externalDocID=10106342