Decision Tree for Sequences

Current decision trees such as C4.5 and CART are widely used in different fields due to their simplicity, accuracy and intuitive interpretation. Similar to other popular classifiers, these tree-based classification algorithms are developed for fixed-length vector data and suffer from intrinsic limit...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 35; no. 1; pp. 251 - 263
Main Authors He, Zengyou, Wu, Ziyao, Xu, Guangyao, Liu, Yan, Zou, Quan
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1041-4347
1558-2191
DOI10.1109/TKDE.2021.3075023

Cover

More Information
Summary:Current decision trees such as C4.5 and CART are widely used in different fields due to their simplicity, accuracy and intuitive interpretation. Similar to other popular classifiers, these tree-based classification algorithms are developed for fixed-length vector data and suffer from intrinsic limitations in handling complex data such as sequences. To tackle the discrete sequence classification task, the dominant strategy is to adopt a two-step procedure: first transform the sequential dataset into a vector dataset and then apply existing tree-based classifiers on the new vector data. However, such methods are highly dependent on the feature generation procedure and some features that are critical to the tree construction may be missed. To alleviate these issues, we present a new tree-based sequence classification method, which is able to construct a concise decision tree from the feature space that is composed of all subsequences presented in the training sequences. Experimental results on fourteen real datasets show that our method can achieve better performance than those state-of-the-art sequence classification algorithms. The source codes of our method are available at: https://github.com/ZiyaoWu/SeqDT .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2021.3075023