A Data Mining Algorithm via Decentralized Optimization in Parallel or Distributed Systems: Mathematical Aspects
We present a new scalable data mining algorithm which can accurately recognize the position of a given pattern as internal, boundary, or external point comparatively to an intersection of a finite family of data sets. The earlier known algorithms (such as point-in-polygon and point-in-polyhedron alg...
Saved in:
| Published in | Lobachevskii journal of mathematics Vol. 46; no. 5; pp. 2363 - 2372 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Moscow
Pleiades Publishing
01.05.2025
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1995-0802 1818-9962 |
| DOI | 10.1134/S1995080225605958 |
Cover
| Summary: | We present a new scalable data mining algorithm which can accurately recognize the position of a given pattern as internal, boundary, or external point comparatively to an intersection of a finite family of data sets. The earlier known algorithms (such as point-in-polygon and point-in-polyhedron algorithms) are applicable only for solving problems of recognizing belongingness of a point to a single set in 2D and 3D space, respectively. In point inclusion testing, the afore-mentioned algorithms have difficulties for some singular cases, namely when the test point coincides with the vertex of the set or lies on its edge. In contrast to them, our algorithm is designed to address the posed problem for data sets of arbitrary dimensions and for any topological spatial interrelation between a point and a set. Moreover, we study the more complicated variant in which the given pattern set is represented as an intersection of some overlapping sets of data. Based on the suitable decomposition and linear separability framework, thanks to a particular construction of the explored task, this algorithm can be executed not only successively but as well as in parallel and distributed fashion in accordance with modern computing technologies. Our algorithm does not require the exchange and synchronization of information between the nodes of computing networks. The concurrent subtasks are fully autonomous since there is no need for any interaction between them. This reduces to zero all the risks of information distortion or loss. Here we concentrate on important mathematical aspects (design and rigorous theoretical justification) of a novel data mining algorithm without focusing on technical issues such as its integrating with concrete machine models in parallel or distributed computing systems. Graphical illustrations and a pseudocode of the algorithm demonstrate the essence of the studied problem and highlight our contributions. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1995-0802 1818-9962 |
| DOI: | 10.1134/S1995080225605958 |