Reinforcement learning for joint pricing, lead-time and scheduling decisions in make-to-order systems

► In this study, we investigate the joint pricing, lead-time and scheduling decisions simultaneously in MTO systems. ► We model the problem as a Semi-Markov Decision Problem (SMDP). ► We develop a reinforcement learning (RL) based Q-learning algorithm (QLA). The paper investigates a problem faced by...

Full description

Saved in:
Bibliographic Details
Published inEuropean journal of operational research Vol. 221; no. 1; pp. 99 - 109
Main Authors Li, Xueping, Wang, Jiao, Sawhney, Rapinder
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 16.08.2012
Elsevier
Elsevier Sequoia S.A
Subjects
Online AccessGet full text
ISSN0377-2217
1872-6860
DOI10.1016/j.ejor.2012.03.020

Cover

More Information
Summary:► In this study, we investigate the joint pricing, lead-time and scheduling decisions simultaneously in MTO systems. ► We model the problem as a Semi-Markov Decision Problem (SMDP). ► We develop a reinforcement learning (RL) based Q-learning algorithm (QLA). The paper investigates a problem faced by a make-to-order (MTO) firm that has the ability to reject or accept orders, and set prices and lead-times to influence demands. Inventory holding costs for early completed orders, tardiness costs for late delivery orders, order rejection costs, manufacturing variable costs, and fixed costs are considered. In order to maximize the expected profits in an infinite planning horizon with stochastic demands, the firm needs to make decisions from the following aspects: which orders to accept or reject, the trade-off between price and lead-time, and the potential for increased demand against capacity constraints. We model the problem as a Semi-Markov Decision Problem (SMDP) and develop a reinforcement learning (RL) based Q-learning algorithm (QLA) for the problem. In addition, we build a discrete-event simulation model to validate the performance of the QLA, and compare the experimental results with two benchmark policies, the First-Come-First-Serve (FCFS) policy and a threshold heuristic policy. It is shown that the QLA outperforms the existing policies.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0377-2217
1872-6860
DOI:10.1016/j.ejor.2012.03.020