Finding sequential patterns with TF-IDF metrics in health-care databases

Finding frequent sequential patterns has been defined as finding ordered list of items that occur more times in a database than a user defined threshold. For big and dense databases that contain really long sequences and large itemset such as medical case histories, algorithm proposed on this idea o...

Full description

Saved in:

Bibliographic Details
Published in	Acta Universitatis Sapientiae. Informatica Vol. 6; no. 2; pp. 287 - 310
Main Authors	Kardkovács, Zsolt T., Kovács, Gábor
Format	Journal Article
Language	English
Published	Cluj-Napoca De Gruyter Open 01.12.2014 De Gruyter Brill Sp. z o.o., Paradigm Publishing Services
Subjects	Algorithms Case histories frequent sequential pattern health care database Pattern search sequence mining Sequences TF-IDF
Online Access	Get full text
ISSN	2066-7760 1844-6086 2066-7760
DOI	10.1515/ausi-2015-0008

Cover

More Information
Summary:	Finding frequent sequential patterns has been defined as finding ordered list of items that occur more times in a database than a user defined threshold. For big and dense databases that contain really long sequences and large itemset such as medical case histories, algorithm proposed on this idea of counting the occurrences output enourmous number of highly redundant frequent sequences, and are therefore simply impractical. Therefore, there is a need for algorithm that perform frequent pattern search and prefiltering simultaneously. In this paper, we propose an algorithm that reinterprets the term support on text mining basis. Experiments show that our method not only eliminates redundancy among the output sequences, but it scales much better with huge input data sizes. We apply our algorithm for mining medical databases: what diagnoses are likely to lead to a certain future health condition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2066-7760 1844-6086 2066-7760
DOI:	10.1515/ausi-2015-0008