PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how t...

Full description

Saved in:
Bibliographic Details
Published in2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming pp. 252 - 257
Main Authors Tao Xiao, Chunfeng Yuan, Yihua Huang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2011
Subjects
Online AccessGet full text
ISBN1457718081
9781457718083
ISSN2168-3034
DOI10.1109/PAAP.2011.38

Cover

More Information
Summary:Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how to adapt SON to the MapReduce paradigm. Then we propose a parallelized SON algorithm, PSON, and implement it in Hadoop. Our study suggests that PSON can mine frequent item sets from a very large database with good performance. The experimental results show that when performing frequent sets mining, the time cost will increase almost linearly with the size of the datasets and decrease with approximately linear trend with the number of cluster nodes. Consequently, we conclude that PSON works well for solving the frequent set mining problem from massive datasets with a good performance in both scalability and speed-up.
ISBN:1457718081
9781457718083
ISSN:2168-3034
DOI:10.1109/PAAP.2011.38