Flexible Executor Allocation without Latency Increase for Stream Processing in Apache Spark

Widely used stream processing systems, e.g., Structured Streaming in Apache Spark, traditionally adopt static strategies for resource allocation. Despite the fact that streaming applications often involve variable loads and long run time, static strategies allocate a fixed amount of resources to mee...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE International Conference on Big Data (Big Data) pp. 2198 - 2206
Main Authors	Morisawa, Yuta, Suzuki, Masaki, Kitahara, Takeshi
Format	Conference Proceeding
Language	English
Published	IEEE 10.12.2020
Subjects	Cluster computing Dynamic scheduling Loading Production Resource management Sparks Task analysis
Online Access	Get full text
DOI	10.1109/BigData50022.2020.9377967

Cover

More Information
Summary:	Widely used stream processing systems, e.g., Structured Streaming in Apache Spark, traditionally adopt static strategies for resource allocation. Despite the fact that streaming applications often involve variable loads and long run time, static strategies allocate a fixed amount of resources to meet the traffic peak, which leads to a severe inefficiency. Dynamic resource provisioning, which manipulates the number of Executors at run time, provides a great opportunity to improve resource utilization and to limit the platform cost. However, adapting the mechanism to latency-sensitive applications is difficult, since each newly added Executor incurs more initialization overhead. The overhead is mainly caused by class loading in JVM, and is incurred every time a new Executor comes in, which may result in service level objective (SLO) violations. In this paper, we propose a new warm-up mechanism that provides a semi-isolated environment inside a cluster to conduct the warm-up on newly added Executors. The proposed component, the named Warm-up Manager, automatically creates and manages warm-up jobs by modifying jobs in production to allow the new Executor to load the necessary classes. The mechanism aims to reduce the initialization overhead and to enable latency-sensitive streaming applications to apply dynamic strategies. We implemented a prototype on Structured Streaming, and the evaluation found that the warm-up overhead imposed on the task was reduced to 4% in the best case scenario.
DOI:	10.1109/BigData50022.2020.9377967