A Data Processing Algorithm to Approximate Big Stream Data Analysis

Sampling large datasets for approximate data processing and analysis is a critical challenge in data management. This becomes increasingly complex with large, continuous data streams, especially when data is distributed across multiple sites and exceeds available resources. The key challenge lies in...

Full description

Saved in:
Bibliographic Details
Published in2025 International Conference on Machine Intelligence and Smart Innovation (ICMISI) pp. 247 - 250
Main Authors Emara, Tamer Z., Ashraf, Rana, Saber, Abeer, Trinh, Thanh, Awad, Wael A.
Format Conference Proceeding
LanguageEnglish
Published IEEE 10.05.2025
Subjects
Online AccessGet full text
DOI10.1109/ICMISI65108.2025.11115782

Cover

More Information
Summary:Sampling large datasets for approximate data processing and analysis is a critical challenge in data management. This becomes increasingly complex with large, continuous data streams, especially when data is distributed across multiple sites and exceeds available resources. The key challenge lies in ensuring that the sampled data retains statistical characteristics similar to the entire dataset. In this paper, we propose an efficient algorithm designed to approximate big stream data analysis. The algorithm segments streaming data into randomized blocks, preserving the statistical properties of the original dataset. By dividing data streams into time-based windows and applying randomized sampling techniques, statistically consistent blocks are generated. Experimental results demonstrate the algorithm's effectiveness, highlighting its ability to reduce computational overhead, improve scalability, and operate effectively in IoT and sensor network environments. By enabling parallel processing of these consistent blocks, the proposed approach addresses real-time data handling challenges and large-scale analytics, paving the way for advancements in adaptive and distributed data stream management.
DOI:10.1109/ICMISI65108.2025.11115782