Optimal capacity planning for cloud service providers with periodic, time-varying demand

Allocating capacity to private cloud computing services is challenging because demand is time-varying, there are often no buffers, and customers can re-submit jobs a finite number of times. We model this setting using a multi-station queueing network where servers represent CPU cores and jobs not im...

Full description

Saved in:

Bibliographic Details
Published in	European journal of operational research Vol. 322; no. 1; pp. 133 - 146
Main Authors	Furman, Eugene, Diamant, Adam
Format	Journal Article
Language	English
Published	Elsevier B.V 01.04.2025
Subjects	Cloud computing Fluid dynamics Offered load analysis Queueing Retrials Queueing Cloud computing Offered load analysis Fluid dynamics Retrials
Online Access	Get full text
ISSN	0377-2217
DOI	10.1016/j.ejor.2024.11.017

Cover

More Information
Summary:	Allocating capacity to private cloud computing services is challenging because demand is time-varying, there are often no buffers, and customers can re-submit jobs a finite number of times. We model this setting using a multi-station queueing network where servers represent CPU cores and jobs not immediately processed retry several times. Assuming retrial rates are stationary and that there is a maximum number of retrial attempts, we determine an optimal service capacity and retrial interval under an admission control policy employed by our partner institution — the server informs customers when they should next attempt service without enforcement. We introduce a recursive representation of the offered load which approximates the fluid dynamics of the system. We then use this representation to develop a solution technique that minimizes the total variation in the constructed offered load. We prove this approach is linked to maximizing system throughput and that in certain settings, the optimal stationary and time-varying retrial intervals are equivalent. Utilizing a data set of cloud computing requests spanning a 24-hour period, our analysis indicates that the optimal policy prescribes a 10% reduction in capacity. We also investigate the fidelity of the fluid model and the sensitivity of our recommendations to the behavior of retrial jobs. We find that retrial-time announcements allow a provider to satisfy service level agreements while encouraging retrial jobs to be processed during off-peak periods. Further, the policy is suitably robust to a customer’s willingness to comply with the suggested retrial times. •We study a private cloud computing service where the server recommends retrial times.•We formulate a stochastic model of system dynamics and analyze a fluid approximation.•We introduce a novel modified offered-load approximation for new and retrial fluid.•We minimize total variation in the modified offered load subject to constraints.•The optimization model gives the optimal server numbers and stationary retrial rates.
ISSN:	0377-2217
DOI:	10.1016/j.ejor.2024.11.017