TKEH: an efficient algorithm for mining top-k high utility itemsets

High utility itemsets mining is a subfield of data mining with wide applications. Although the existing high utility itemsets mining algorithms can discover all the itemsets satisfying a given minimum utility threshold, it is often difficult for users to set a proper minimum utility threshold. A sma...

Full description

Saved in:

Bibliographic Details
Published in	Applied intelligence (Dordrecht, Netherlands) Vol. 49; no. 3; pp. 1078 - 1097
Main Authors	Singh, Kuldeep, Singh, Shashank Sheshar, Kumar, Ajay, Biswas, Bhaskar
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2019 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Computer Science Data mining Datasets Machines Manufacturing Mechanical Engineering Processes State of the art Upper bounds High utility itemsets Threshold raising strategies Utility mining Top-k itemset mining Itemset mining
Online Access	Get full text
ISSN	0924-669X 1573-7497
DOI	10.1007/s10489-018-1316-x

Cover

More Information
Summary:	High utility itemsets mining is a subfield of data mining with wide applications. Although the existing high utility itemsets mining algorithms can discover all the itemsets satisfying a given minimum utility threshold, it is often difficult for users to set a proper minimum utility threshold. A smaller minimum utility threshold value may produce a huge number of itemsets, whereas a higher one may produce a few itemsets. Specification of minimum utility threshold is difficult and time-consuming. To address these issues, top-k high utility itemsets mining has been defined where k is the number of high utility itemsets to be found. In this paper, we present an efficient algorithm (named TKEH) for finding top-k high utility itemsets. TKEH utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. These techniques reduce the dataset when larger items are explored. TKEH employs three minimum utility threshold raising strategies. We utilize two strategies to prune search space efficiently. To calculate the utility of items and upper-bounds in linear time, TKEH utilizes array-based utility technique. We carried out some extensive experiments on real datasets. The results show that TKEH outperforms the state-of-the-art algorithms. Moreover, TKEH always performs better for dense datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-018-1316-x