TKFIM: Top-K frequent itemset mining technique based on equivalence classes

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 7; p. e385
Main Authors	Iqbal, Saood, Shahid, Abdul, Roman, Muhammad, Khan, Zahid, Al-Otaibi, Shaha, Yu, Lisu
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 08.03.2021 PeerJ, Inc PeerJ Inc
Subjects	Algorithm Analysis Algorithms Algorithms and Analysis of Algorithms Artifical Intelligence Artificial Intelligence Data mining Data Mining and Machine Learning Data Science Datasets Equivalence Frequent Itemsets Information management Methods Mushrooms Support Threshold Top-k Frequent Itemsets Support Threshold Top-k Frequent Itemsets Algorithm Analysis Artifical Intelligence Frequent Itemsets
Online Access	Get full text
ISSN	2376-5992 2376-5992
DOI	10.7717/peerj-cs.385

Cover

More Information
Summary:	Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset’s characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.385