Flexible Executor Allocation without Latency Increase for Stream Processing in Apache Spark
Widely used stream processing systems, e.g., Structured Streaming in Apache Spark, traditionally adopt static strategies for resource allocation. Despite the fact that streaming applications often involve variable loads and long run time, static strategies allocate a fixed amount of resources to mee...
Saved in:
| Published in | 2020 IEEE International Conference on Big Data (Big Data) pp. 2198 - 2206 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
10.12.2020
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/BigData50022.2020.9377967 |
Cover
| Summary: | Widely used stream processing systems, e.g., Structured Streaming in Apache Spark, traditionally adopt static strategies for resource allocation. Despite the fact that streaming applications often involve variable loads and long run time, static strategies allocate a fixed amount of resources to meet the traffic peak, which leads to a severe inefficiency. Dynamic resource provisioning, which manipulates the number of Executors at run time, provides a great opportunity to improve resource utilization and to limit the platform cost. However, adapting the mechanism to latency-sensitive applications is difficult, since each newly added Executor incurs more initialization overhead. The overhead is mainly caused by class loading in JVM, and is incurred every time a new Executor comes in, which may result in service level objective (SLO) violations. In this paper, we propose a new warm-up mechanism that provides a semi-isolated environment inside a cluster to conduct the warm-up on newly added Executors. The proposed component, the named Warm-up Manager, automatically creates and manages warm-up jobs by modifying jobs in production to allow the new Executor to load the necessary classes. The mechanism aims to reduce the initialization overhead and to enable latency-sensitive streaming applications to apply dynamic strategies. We implemented a prototype on Structured Streaming, and the evaluation found that the warm-up overhead imposed on the task was reduced to 4% in the best case scenario. |
|---|---|
| DOI: | 10.1109/BigData50022.2020.9377967 |