PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how t...

Full description

Saved in:

Bibliographic Details
Published in	2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming pp. 252 - 257
Main Authors	Tao Xiao, Chunfeng Yuan, Yihua Huang
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2011
Subjects	Algorithm design and analysis Data mining Distributed databases frequent sets mining Hadoop Itemsets MapReduce parallelized SON algorithm Partitioning algorithms
Online Access	Get full text
ISBN	1457718081 9781457718083
ISSN	2168-3034
DOI	10.1109/PAAP.2011.38

Cover

More Information
Summary:	Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how to adapt SON to the MapReduce paradigm. Then we propose a parallelized SON algorithm, PSON, and implement it in Hadoop. Our study suggests that PSON can mine frequent item sets from a very large database with good performance. The experimental results show that when performing frequent sets mining, the time cost will increase almost linearly with the size of the datasets and decrease with approximately linear trend with the number of cluster nodes. Consequently, we conclude that PSON works well for solving the frequent set mining problem from massive datasets with a good performance in both scalability and speed-up.
ISBN:	1457718081 9781457718083
ISSN:	2168-3034
DOI:	10.1109/PAAP.2011.38