A Data Processing Algorithm to Approximate Big Stream Data Analysis

Sampling large datasets for approximate data processing and analysis is a critical challenge in data management. This becomes increasingly complex with large, continuous data streams, especially when data is distributed across multiple sites and exceeds available resources. The key challenge lies in...

Full description

Saved in:

Bibliographic Details
Published in	2025 International Conference on Machine Intelligence and Smart Innovation (ICMISI) pp. 247 - 250
Main Authors	Emara, Tamer Z., Ashraf, Rana, Saber, Abeer, Trinh, Thanh, Awad, Wael A.
Format	Conference Proceeding
Language	English
Published	IEEE 10.05.2025
Subjects	Approximation algorithms Big Stream Data Analysis Computational modeling Data analysis Data Management Distributed Data Stream Management Distributed databases Heuristic algorithms IoT and Sensor Networks Partitioning algorithms Power systems Randomized Sampling Real-time systems Scalability Technological innovation
Online Access	Get full text
DOI	10.1109/ICMISI65108.2025.11115782

Cover

More Information
Summary:	Sampling large datasets for approximate data processing and analysis is a critical challenge in data management. This becomes increasingly complex with large, continuous data streams, especially when data is distributed across multiple sites and exceeds available resources. The key challenge lies in ensuring that the sampled data retains statistical characteristics similar to the entire dataset. In this paper, we propose an efficient algorithm designed to approximate big stream data analysis. The algorithm segments streaming data into randomized blocks, preserving the statistical properties of the original dataset. By dividing data streams into time-based windows and applying randomized sampling techniques, statistically consistent blocks are generated. Experimental results demonstrate the algorithm's effectiveness, highlighting its ability to reduce computational overhead, improve scalability, and operate effectively in IoT and sensor network environments. By enabling parallel processing of these consistent blocks, the proposed approach addresses real-time data handling challenges and large-scale analytics, paving the way for advancements in adaptive and distributed data stream management.
DOI:	10.1109/ICMISI65108.2025.11115782