Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster

Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mine...

Full description

Saved in:
Bibliographic Details
Published inDatabase and Expert Systems Applications Vol. 2736; pp. 537 - 547
Main Authors Pramudiono, Iko, Kitsuregawa, Masaru
Format Book Chapter Conference Proceeding
LanguageEnglish
Japanese
Published Germany Springer Berlin / Heidelberg 2003
Springer Berlin Heidelberg
Springer
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783540408062
3540408061
ISSN0302-9743
1611-3349
DOI10.1007/978-3-540-45227-0_53

Cover

More Information
Summary:Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mined using frequent pattern growth methodology. Higher level of performance improvement can be expected from parallel execution. In particular, PC cluster is gaining popularity as the high cost-performance parallel platform for data extensive task like data mining. However, we have to address many issues such as space distribution on each node and skew handling to efficiently mine frequent patterns from tree structure on a shared-nothing environment. We develop a framework to address those issues using novel granularity control mechanism and tree remerging. The common framework can be enhanced with temporal constrain to mine web access patterns. We invent improved support counting procedure to reduce the additional communication overhead. Real implementation using up to 32 nodes confirms that good speedup ratio can be achieved even on skewed environment.
ISBN:9783540408062
3540408061
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-540-45227-0_53