Study on outlier detection algorithm based on tightest neighbors
•A new concept of tightest neighbors and some important properties of it.•The outliers have a significant “plateau” in the sense of tightest neighbors.•The outlier factor TNOF is simple to compute by considering the symmetry of relationship.•Proposing a fast and parameter-insensitive algorithm TNOF...
Saved in:
| Published in | Expert systems with applications Vol. 290; p. 128385 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Ltd
25.09.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0957-4174 |
| DOI | 10.1016/j.eswa.2025.128385 |
Cover
| Summary: | •A new concept of tightest neighbors and some important properties of it.•The outliers have a significant “plateau” in the sense of tightest neighbors.•The outlier factor TNOF is simple to compute by considering the symmetry of relationship.•Proposing a fast and parameter-insensitive algorithm TNOF with efficient detection of noise datasets.
Outlier detection is a popular research topic in data mining, whose goal is to identify individuals that are significantly different from other points in a dataset. However, many outlier detection algorithms have difficulty in solving various complex datasets with noise. We propose a new concept of tightest neighbors and introduce some important properties of it. We find that the outliers have a significant “plateau” in the sense of tightest neighbors, and become separate branches in the tightest neighbors graph. Based on the phenomenon, we propose a new concept of outlier factor which not only simplifies the computation of the local outlier factor, but also is a better measure of the characteristics of outliers by considering both the symmetry of the tightest neighbor relationship and the distance between data points. Then we propose an outlier detection algorithm TNOF based on local density and tightest neighbors, which can automatically classify the outliers to avoid the top-n problem. A large number of experiments show that our proposed algorithm has good robustness and superior performance, which is suitable for detecting datasets with large differences in density levels and complex distributions, and also has strong applicability in practical problems in the medical field. |
|---|---|
| ISSN: | 0957-4174 |
| DOI: | 10.1016/j.eswa.2025.128385 |