Dynamic pattern matching with multiple queries on large scale data streams

•Similarity search in data streams is challenging due to outliers, noise and potential amplitude and time distortions.•The majority of methods fail due to limitations in data normalization.•Dynamic normalization approach allows bringing streaming signal subsequences to the scale of the query templat...

Full description

Saved in:
Bibliographic Details
Published inSignal processing Vol. 171; p. 107402
Main Authors Sukhanov, S., Wu, R., Debes, C., Zoubir, A.M.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.06.2020
Subjects
Online AccessGet full text
ISSN0165-1684
1872-7557
DOI10.1016/j.sigpro.2019.107402

Cover

More Information
Summary:•Similarity search in data streams is challenging due to outliers, noise and potential amplitude and time distortions.•The majority of methods fail due to limitations in data normalization.•Dynamic normalization approach allows bringing streaming signal subsequences to the scale of the query template improving matching performance when sampling variance or time distortions are present.•The similarity search is extended for the case of multiple queries. Similarity search in data streams is an important but challenging task in many practical areas where real-time pattern retrieval is required. Dynamic and fast updating data streams are often subject to outliers, noise and potential distortions in amplitude and time dimensions. Such conditions typically lead to a failure of existing pattern matching algorithms and to inability to retrieve required patterns from the stream. The main reason for such failures is the limitation of data normalization utilized in the majority of methods. Another reason is the lack of means to consider multiple examples of the same template to account for possible variations of the query signal. In this paper, we propose a dynamic normalization approach that allows bringing streaming signal subsequences to the scale of the query template. This significantly improves pattern retrieval capabilities, especially when sampling variance or time distortions are present. We further develop a pattern matching approach utilizing the proposed normalization mechanism and extend it for the case when multiple examples of a query template are available. Multiple synthetic and real data experiments demonstrate that this allows to considerably improve the pattern matching rate for distorted data streams, providing real time performance.
ISSN:0165-1684
1872-7557
DOI:10.1016/j.sigpro.2019.107402