Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster
Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mine...
Saved in:
Published in | Database and Expert Systems Applications Vol. 2736; pp. 537 - 547 |
---|---|
Main Authors | , |
Format | Book Chapter Conference Proceeding |
Language | English Japanese |
Published |
Germany
Springer Berlin / Heidelberg
2003
Springer Berlin Heidelberg Springer |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783540408062 3540408061 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-540-45227-0_53 |
Cover
Summary: | Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mined using frequent pattern growth methodology. Higher level of performance improvement can be expected from parallel execution. In particular, PC cluster is gaining popularity as the high cost-performance parallel platform for data extensive task like data mining. However, we have to address many issues such as space distribution on each node and skew handling to efficiently mine frequent patterns from tree structure on a shared-nothing environment. We develop a framework to address those issues using novel granularity control mechanism and tree remerging. The common framework can be enhanced with temporal constrain to mine web access patterns. We invent improved support counting procedure to reduce the additional communication overhead. Real implementation using up to 32 nodes confirms that good speedup ratio can be achieved even on skewed environment. |
---|---|
ISBN: | 9783540408062 3540408061 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-540-45227-0_53 |