Outlier Detection over Sliding Windows for Probabilistic Data Streams

Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this pape...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 25; no. 3; pp. 389 - 400
Main Author 王斌 杨晓春 王国仁 于戈
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.05.2010
Springer Nature B.V
Key Laboratory of Medical Image Computing(Northeastern University),Ministry of Education,Shenyang 110004,China
School of Information Science and Engineering,Northeastern University,Shenyang 110004,China
Subjects
Online AccessGet full text
ISSN1000-9000
1860-4749
DOI10.1007/s11390-010-9332-2

Cover

More Information
Summary:Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from 0(2IR(~'d)l) to O(Ik.R(e, d)l), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.
Bibliography:11-2296/TP
TP393.08
TP393
outlier detection, uncertain data, probabilistic data stream, sliding window
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-010-9332-2