Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms
Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor in...
Saved in:
| Published in | Proceedings - International Conference on High Performance Computing pp. 282 - 291 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.12.2022
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2640-0316 |
| DOI | 10.1109/HiPC56025.2022.00045 |
Cover
| Abstract | Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor information, GNN inferences require raw data from many times more nodes than are targeted for prediction. Thus, on heterogeneous compute platforms, inference latency can be largely subject to the inter-device communication cost of transferring input feature data to the GPU/accelerator before computation has even begun. In this paper, we analyze the trade-off effect of pruning input features from GNN models, reducing the volume of raw data that the model works with to lower communication latency at the expense of an expected decrease in the overall model accuracy. We develop greedy and regression-based algorithms to determine which features to retain for optimal prediction accuracy. We evaluate pruned model variants and find that they can reduce inference latency by up to 80% with an accuracy loss of less than 5% compared to non-pruned models. Furthermore, we show that the latency reductions from input feature pruning can be extended under different system variables such as batch size and floating point precision. |
|---|---|
| AbstractList | Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor information, GNN inferences require raw data from many times more nodes than are targeted for prediction. Thus, on heterogeneous compute platforms, inference latency can be largely subject to the inter-device communication cost of transferring input feature data to the GPU/accelerator before computation has even begun. In this paper, we analyze the trade-off effect of pruning input features from GNN models, reducing the volume of raw data that the model works with to lower communication latency at the expense of an expected decrease in the overall model accuracy. We develop greedy and regression-based algorithms to determine which features to retain for optimal prediction accuracy. We evaluate pruned model variants and find that they can reduce inference latency by up to 80% with an accuracy loss of less than 5% compared to non-pruned models. Furthermore, we show that the latency reductions from input feature pruning can be extended under different system variables such as batch size and floating point precision. |
| Author | Kuppannagari, Sanmukh R. Yik, Jason Zeng, Hanqing Prasanna, Viktor K. |
| Author_xml | – sequence: 1 givenname: Jason surname: Yik fullname: Yik, Jason email: jyik@g.harvard.edu organization: Harvard University – sequence: 2 givenname: Sanmukh R. surname: Kuppannagari fullname: Kuppannagari, Sanmukh R. email: sanmukh.kuppannagari@case.edu organization: Case Western Reserve University – sequence: 3 givenname: Hanqing surname: Zeng fullname: Zeng, Hanqing email: zengh@meta.com organization: Meta AI – sequence: 4 givenname: Viktor K. surname: Prasanna fullname: Prasanna, Viktor K. email: prasanna@usc.edu organization: University of Southern California |
| BookMark | eNotj8Fqg0AURaelhSZp_iCL-QHtm-fM6CyDNIkQEhftOozjM1jMGEZd9O9raVcHLtwDZ8mefO-JsY2AWAgwb4e2zJUGVDECYgwAUj2wtUkzobWSRmKiH9kCtYQIEqFf2HIYvgAQBKoFOxf-Po18R3acAvEyTL71V970gW-do46CHX-H_enEC99QIO-I954faKTQX8lTPw287Ow4f27DK3tubDfQ-p8r9rl7_8gP0fG8L_LtMWoR5BhZidDUNc50TeoqkwmrgEC4Sqa1hMxWVhsFWoCjDOcMRypxqnKVJgOUrNjmz9sS0eUe2psN3xcBAnQyJ_8AAllRuQ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/HiPC56025.2022.00045 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781665494236 1665494239 |
| EISSN | 2640-0316 |
| EndPage | 291 |
| ExternalDocumentID | 10106342 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation funderid: 10.13039/100000001 |
| GroupedDBID | 29H 29O 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i204t-a420fdd2a42cf7cb981a50e01cb47d408aba6950610ce82494ce53c5bcb6e90e3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:21:30 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-a420fdd2a42cf7cb981a50e01cb47d408aba6950610ce82494ce53c5bcb6e90e3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10106342 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-Dec. |
| PublicationDateYYYYMMDD | 2022-12-01 |
| PublicationDate_xml | – month: 12 year: 2022 text: 2022-Dec. |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - International Conference on High Performance Computing |
| PublicationTitleAbbrev | HIPC |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020125 |
| Score | 2.2267404 |
| Snippet | Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 282 |
| SubjectTerms | accuracy/performance trade-off Analytical models Computational modeling data science algorithms graph neural network High performance computing input feature pruning Machine learning Prediction algorithms Predictive models Solid modeling |
| Title | Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms |
| URI | https://ieeexplore.ieee.org/document/10106342 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELWgJ05lKWKXD1xTnMRO4iOqKCkSoQcq9VZ5mUgVKEFtcuHrGWcpAgmJUyJLWeSx_ebZ82YIuUXEMIhciad8bTyEgNyTCKSeAm5dJlSbS6d3fs6idMGflmLZidUbLQwANMFnMHa3zVm-LU3ttspwhiOBCTmuuPtxErVirR27wpVWdNo4n8m7dD2fIJoHAjlg0CblFD8qqDQAMh2SrP90GzfyNq4rPTafv7Iy_vvfDsnoW6tH5zsUOiJ7UByTYV-sgXZz94S8zApsoc7lqzfukdrtiFD0Wem9MQg-bihgw2OW0dnuzWVBUxcwU-I4g7Le0vm7qpyfux2RxfThdZJ6XTUFbx0wXnmKByy3NsCryWOjZeIrwYD5RvPYcpYorSIpEN-ZgQRZGTcgQiO00RFIBuEpGRRlAWeEamQ91g9UnEuJ_BOSOAc_zLU2HE0fROdk5Dpo9dEmzFj1fXPxR_slOXBGaqNErsig2tRwjVhf6ZvGxl-Toqlb |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELYQDDCVRxFvPLCmOImdxCOqKCm0oUMrdats5yJVoAS1ycKv55ykRSAhMSWylId8tr_77PvuCLlDxDCIXJGjXG0chIDMkQikjgKe2kyoaSat3nmcBPGMP8_FvBWr11oYAKiDz6Bnb-uz_LQwld0qwxmOBMbnuOLuCc65aORaW36Fa61o1XEuk_fxctJHPPcEskCvScspftRQqSFk0CHJ5uNN5Mhbryp1z3z-ysv47787JN1vtR6dbHHoiOxAfkw6m3INtJ29J-R1mGMLtU5ftbKPVHZPhKLXSh-MQfixgwEbnpKEDrdvLnIa25CZAkcaFNWaTt5VaT3ddZfMBo_Tfuy09RScpcd46SjusSxNPbyaLDRaRq4SDJhrNA9TziKlVSAFIjwzECEv4waEb4Q2OgDJwD8lu3mRwxmhGnlP6noqzKREBgpRmIHrZ1obtIvxgnPStR20-GhSZiw2fXPxR_st2Y-n49FiNExeLsmBNVgTM3JFdstVBdeI_KW-qe39BaFWrKg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+High+Performance+Computing&rft.atitle=Input+Feature+Pruning+for+Accelerating+GNN+Inference+on+Heterogeneous+Platforms&rft.au=Yik%2C+Jason&rft.au=Kuppannagari%2C+Sanmukh+R.&rft.au=Zeng%2C+Hanqing&rft.au=Prasanna%2C+Viktor+K.&rft.date=2022-12-01&rft.pub=IEEE&rft.eissn=2640-0316&rft.spage=282&rft.epage=291&rft_id=info:doi/10.1109%2FHiPC56025.2022.00045&rft.externalDocID=10106342 |