SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction

Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space...

Full description

Saved in:
Bibliographic Details
Published in2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT) pp. 318 - 330
Main Authors Gui, Chuangyi, Liao, Xiaofei, Zheng, Long, Yao, Pengcheng, Wang, Qinggang, Jin, Hai
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2021
Subjects
Online AccessGet full text
DOI10.1109/PACT52795.2021.00030

Cover

More Information
Summary:Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space towards a single pattern by exploring a highly-optimized matching order, but inherent computational redundancies of such a matching order itself still suffer severely, leading to significant performance degradation. The key innovation of this work lies in a general redundancy criterion that characterizes computational redundancies arising in not only handing a single pattern but also matching multiple patterns simultaneously. In this paper, we present SumPA, a high-performance pattern-centric graph mining system that can sufficiently remove redundant computations for any complex graph mining problems. SumPA features three key designs: (1) a pattern abstraction technique that can simplify numerous complex patterns into a few simple abstract patterns based on pattern similarity, (2) abstraction-guided pattern matching that completely eliminates (totally and partially) redundant computations during subgraph enumeration, and (3) a suite of system optimizations to maximize storage and computation efficiency. Our evaluation on a wide variety of real-world graphs shows that SumPA outperforms the two state-of-the-art systems Peregrine and GraphPi by up to 61.89× and 8.94×, respectively. For many mining problems on large graphs, Peregrine takes hours or even days while SumPA finishes in only a few minutes.
DOI:10.1109/PACT52795.2021.00030