Scalable Frequent Sequence Mining with Flexible Subsequence Constraints

We study scalable algorithms for frequent sequence mining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are of interest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is...

Full description

Saved in:

Bibliographic Details
Published in	Data engineering pp. 1490 - 1501
Main Authors	Renz-Wieland, Alexander, Bertsch, Matthias, Gemulla, Rainer
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2019
Subjects	Computational modeling Data mining frequent sequence mining large scale sequence mining Partitioning algorithms Scalability scalable data analysis sequential pattern mining Sparks Task analysis
Online Access	Get full text
ISSN	2375-026X
DOI	10.1109/ICDE.2019.00134

Cover

More Information
Summary:	We study scalable algorithms for frequent sequence mining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are of interest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is suitable for platforms such as MapReduce or Spark. We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. The algorithms differ in what data are communicated and how computation is split up among workers. To the best of our knowledge, D-SEQ and D-CAND are the first scalable algorithms for frequent sequence mining with flexible constraints. We conducted an experimental study on multiple real-world datasets that suggests that our algorithms scale nearly linearly, outperform common baselines, and offer acceptable generalization overhead over existing, less general mining algorithms.
ISSN:	2375-026X
DOI:	10.1109/ICDE.2019.00134