An efficient algorithm to maintain the discovered frequent sequences with record deletion

One of the major concerns with Sequential-pattern mining (SPM) is how to discover frequent sequences from transactional databases. Most SPM algorithms can only handle static databases, which is not practical in real-life situations. The Fast UPdated 2 (FUP2) algorithm was proposed to maintain and up...

Full description

Saved in:
Bibliographic Details
Published inIntelligent data analysis Vol. 20; no. 3; pp. 655 - 677
Main Authors Lin, Jerry Chun-Wei, Gan, Wensheng, Hong, Tzung-Pei, Chen, Hsin-Yi, Li, Sheng-Tun
Format Journal Article
LanguageEnglish
Published London, England SAGE Publications 20.04.2016
Subjects
Online AccessGet full text
ISSN1088-467X
1571-4128
DOI10.3233/IDA-160825

Cover

More Information
Summary:One of the major concerns with Sequential-pattern mining (SPM) is how to discover frequent sequences from transactional databases. Most SPM algorithms can only handle static databases, which is not practical in real-life situations. The Fast UPdated 2 (FUP2) algorithm was proposed to maintain and update the discovered association rules for transaction deletion. This algorithm can also be extended to SPM for maintaining the discovered frequent sequences, especially when some sequential records are deleted from their original databases. The original database is, however, required to be rescanned if the maintained sequences are small in the deleted sequential records. In the past, a pre-large concept was proposed to reduce the computational cost of database rescans until the number of deleted customers of the deleted sequential records achieves the designed safety bound. In this paper, a PreFUSP-TREE-DEL algorithm is proposed to adopt a pre-large FUSP-tree structure and the pre-large concept is used for maintaining and updating the discovered sequential patterns with record deletion. The proposed algorithm first partitions the discovered sequential patterns into three parts with nine cases. The discovered sequential patterns of each case are then maintained and updated by the designed procedure. Based on the proposed PreFUSP-TREE-DEL algorithm, it is unnecessary to rescan the original database until the cumulative number of deleted customers achieves the designed safety bound. The conducted experiments show that that the proposed PreFUSP-TREE-DEL algorithm has good performance when compared to other batch-mode algorithms or other maintenance algorithms.
ISSN:1088-467X
1571-4128
DOI:10.3233/IDA-160825