Constrained distance based clustering for time-series: a comparative and experimental study

Constrained clustering is becoming an increasingly popular approach in data mining. It offers a balance between the complexity of producing a formal definition of thematic classes—required by supervised methods—and unsupervised approaches, which ignore expert knowledge and intuition. Nevertheless, t...

Full description

Saved in:
Bibliographic Details
Published inData mining and knowledge discovery Vol. 32; no. 6; pp. 1663 - 1707
Main Authors Lampert, Thomas, Dao, Thi-Bich-Hanh, Lafabregue, Baptiste, Serrette, Nicolas, Forestier, Germain, Crémilleux, Bruno, Vrain, Christel, Gançarski, Pierre
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2018
Springer Nature B.V
Springer
Subjects
Online AccessGet full text
ISSN1384-5810
1573-756X
1573-756X
DOI10.1007/s10618-018-0573-y

Cover

More Information
Summary:Constrained clustering is becoming an increasingly popular approach in data mining. It offers a balance between the complexity of producing a formal definition of thematic classes—required by supervised methods—and unsupervised approaches, which ignore expert knowledge and intuition. Nevertheless, the application of constrained clustering to time-series analysis is relatively unknown. This is partly due to the unsuitability of the Euclidean distance metric, which is typically used in data mining, to time-series data. This article addresses this divide by presenting an exhaustive review of constrained clustering algorithms and by modifying publicly available implementations to use a more appropriate distance measure—dynamic time warping. It presents a comparative study, in which their performance is evaluated when applied to time-series. It is found that k -means based algorithms become computationally expensive and unstable under these modifications. Spectral approaches are easily applied and offer state-of-the-art performance, whereas declarative approaches are also easily applied and guarantee constraint satisfaction. An analysis of the results raises several influencing factors to an algorithm’s performance when constraints are introduced.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1384-5810
1573-756X
1573-756X
DOI:10.1007/s10618-018-0573-y