Diffusion Model-aided Resource Scheduling for Multiple GAI Training Jobs

With the prosperity of AI-Generated Content (AIGC), efficiently scheduling multiple Generative AI (GAI) distributed training jobs in a computing cluster has become crucial for pursuing higher cost-effectiveness. However, the resource-intensive nature and frequent communication demands of distributed...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - International Conference on Computer Communications and Networks pp. 1 - 6
Main Authors	Yuan, Meng, Wu, Qiang, Wang, Xiangbin, Sun, Siyang
Format	Conference Proceeding
Language	English
Published	IEEE 04.08.2025
Subjects	AIGC Clustering algorithms Computational modeling Deep reinforcement learning diffusion model Dynamic scheduling Heuristic algorithms multiple GAI distributed training jobs Optimal scheduling Processor scheduling resource scheduling Synchronization Training Uncertainty
Online Access	Get full text
ISSN	2637-9430
DOI	10.1109/ICCCN65249.2025.11134026

Cover

More Information
Summary:	With the prosperity of AI-Generated Content (AIGC), efficiently scheduling multiple Generative AI (GAI) distributed training jobs in a computing cluster has become crucial for pursuing higher cost-effectiveness. However, the resource-intensive nature and frequent communication demands of distributed training exacerbate resource fragmentation and network contention, resulting in low utilization and high latency. To this end, we propose an intelligent and dynamic resource scheduling method. Firstly, we propose an innovative scheduling analytical model that describes heterogeneous computing resources, communication contention, and the parameter synchronization architecture. We then formulate it as a multi-objective optimization problem. Next, we propose a Diffusion Model-based AI-generated Resource Scheduling (DARS) algorithm, to capture dynamic and high-dimensional environment and generate the optimal scheduling decisions. Finally, the policy network of deep reinforcement learning (DRL) is replaced with the proposed DARS to address the environmental uncertainty and enhance efficiency. Simulation results demonstrate that our proposed algorithm outperforms associated algorithms.
ISSN:	2637-9430
DOI:	10.1109/ICCCN65249.2025.11134026