Study on outlier detection algorithm based on tightest neighbors

•A new concept of tightest neighbors and some important properties of it.•The outliers have a significant “plateau” in the sense of tightest neighbors.•The outlier factor TNOF is simple to compute by considering the symmetry of relationship.•Proposing a fast and parameter-insensitive algorithm TNOF...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 290; p. 128385
Main Authors Gao, Lei, Tian, Taichang, Wen, Luosheng
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 25.09.2025
Subjects
Online AccessGet full text
ISSN0957-4174
DOI10.1016/j.eswa.2025.128385

Cover

More Information
Summary:•A new concept of tightest neighbors and some important properties of it.•The outliers have a significant “plateau” in the sense of tightest neighbors.•The outlier factor TNOF is simple to compute by considering the symmetry of relationship.•Proposing a fast and parameter-insensitive algorithm TNOF with efficient detection of noise datasets. Outlier detection is a popular research topic in data mining, whose goal is to identify individuals that are significantly different from other points in a dataset. However, many outlier detection algorithms have difficulty in solving various complex datasets with noise. We propose a new concept of tightest neighbors and introduce some important properties of it. We find that the outliers have a significant “plateau” in the sense of tightest neighbors, and become separate branches in the tightest neighbors graph. Based on the phenomenon, we propose a new concept of outlier factor which not only simplifies the computation of the local outlier factor, but also is a better measure of the characteristics of outliers by considering both the symmetry of the tightest neighbor relationship and the distance between data points. Then we propose an outlier detection algorithm TNOF based on local density and tightest neighbors, which can automatically classify the outliers to avoid the top-n problem. A large number of experiments show that our proposed algorithm has good robustness and superior performance, which is suitable for detecting datasets with large differences in density levels and complex distributions, and also has strong applicability in practical problems in the medical field.
ISSN:0957-4174
DOI:10.1016/j.eswa.2025.128385