A Data Mining Algorithm via Decentralized Optimization in Parallel or Distributed Systems: Mathematical Aspects

We present a new scalable data mining algorithm which can accurately recognize the position of a given pattern as internal, boundary, or external point comparatively to an intersection of a finite family of data sets. The earlier known algorithms (such as point-in-polygon and point-in-polyhedron alg...

Full description

Saved in:

Bibliographic Details
Published in	Lobachevskii journal of mathematics Vol. 46; no. 5; pp. 2363 - 2372
Main Authors	Gabidullina, Z. R., Doostmohammadian, M. R.
Format	Journal Article
Language	English
Published	Moscow Pleiades Publishing 01.05.2025 Springer Nature B.V
Subjects	Algebra Algorithms Analysis Computer networks Data mining Datasets Decomposition Distributed processing Euclidean space Geometry Intersections Mathematical Logic and Foundations Mathematical programming Mathematics Mathematics and Statistics Probability Theory and Stochastic Processes Problem solving Synchronism 65K05 Minkowski difference cone of generalized support vectors 90C47 spatial topological interrelation 90C90 linear separability criterion decomposition 65Y20 6207 distributed computing
Online Access	Get full text
ISSN	1995-0802 1818-9962
DOI	10.1134/S1995080225605958

Cover

More Information
Summary:	We present a new scalable data mining algorithm which can accurately recognize the position of a given pattern as internal, boundary, or external point comparatively to an intersection of a finite family of data sets. The earlier known algorithms (such as point-in-polygon and point-in-polyhedron algorithms) are applicable only for solving problems of recognizing belongingness of a point to a single set in 2D and 3D space, respectively. In point inclusion testing, the afore-mentioned algorithms have difficulties for some singular cases, namely when the test point coincides with the vertex of the set or lies on its edge. In contrast to them, our algorithm is designed to address the posed problem for data sets of arbitrary dimensions and for any topological spatial interrelation between a point and a set. Moreover, we study the more complicated variant in which the given pattern set is represented as an intersection of some overlapping sets of data. Based on the suitable decomposition and linear separability framework, thanks to a particular construction of the explored task, this algorithm can be executed not only successively but as well as in parallel and distributed fashion in accordance with modern computing technologies. Our algorithm does not require the exchange and synchronization of information between the nodes of computing networks. The concurrent subtasks are fully autonomous since there is no need for any interaction between them. This reduces to zero all the risks of information distortion or loss. Here we concentrate on important mathematical aspects (design and rigorous theoretical justification) of a novel data mining algorithm without focusing on technical issues such as its integrating with concrete machine models in parallel or distributed computing systems. Graphical illustrations and a pseudocode of the algorithm demonstrate the essence of the studied problem and highlight our contributions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1995-0802 1818-9962
DOI:	10.1134/S1995080225605958