A self-learning whale optimization algorithm based on reinforcement learning for a dual-resource flexible job shop scheduling problem

One of the key areas in which production systems researchers are working these days is to find advanced optimization algorithms to efficiently schedule activities in manufacturing systems, which requires more sophisticated models with increased computational complexity. Therefore, there has been gro...

Full description

Saved in:

Bibliographic Details
Published in	Applied soft computing Vol. 180; p. 113436
Main Authors	Manafi, Ehsan, Domenech, Bruno, Tavakkoli-Moghaddam, Reza, Ranaboldo, Matteo
Format	Journal Article
Language	English
Published	Elsevier B.V 01.08.2025
Subjects	Flexible job shop scheduling Machine learning Meta-heuristics Reconfigurable manufacturing systems Reinforcement learning Reconfigurable manufacturing systems Flexible job shop scheduling Machine learning Reinforcement learning Meta-heuristics
Online Access	Get full text
ISSN	1568-4946
DOI	10.1016/j.asoc.2025.113436

Cover

More Information
Summary:	One of the key areas in which production systems researchers are working these days is to find advanced optimization algorithms to efficiently schedule activities in manufacturing systems, which requires more sophisticated models with increased computational complexity. Therefore, there has been growing interest in this subject to improve the performance of meta-heuristics by incorporating reinforcement learning approaches. This paper deals with a dual-resource flexible job shop scheduling (DRFJSS) problem, in which each operation requires two resources (i.e., reconfigurable machine tool (RMT) and worker) to be processed. A mixed-integer linear programming (MILP) model is formulated to minimize the makespan. Since the proposed model cannot optimally solve most medium-sized instances, a self-learning whale optimization algorithm (SLWOA) is developed to deal efficiently with such a difficult problem. In the proposed SLWOA, an agent is trained by the state–action–reward–state–action (SARSA) algorithm to balance exploration and exploitation. The results show that the SLWOA has a stronger global search ability and faster convergence speed than the original whale optimization algorithm. [Display omitted] •Studying dual-resource scheduling in shop floors with reconfigurable machine tools.•Formulating a position-based MILP model for scheduling optimization.•Proposing a self-learning whale algorithm for large instance problems.•Designing states, actions, and rewards for reinforcement learning integration.•Developing a variable neighbourhood search to improve the local search.
ISSN:	1568-4946
DOI:	10.1016/j.asoc.2025.113436