Reinforcement learning for joint pricing, lead-time and scheduling decisions in make-to-order systems

► In this study, we investigate the joint pricing, lead-time and scheduling decisions simultaneously in MTO systems. ► We model the problem as a Semi-Markov Decision Problem (SMDP). ► We develop a reinforcement learning (RL) based Q-learning algorithm (QLA). The paper investigates a problem faced by...

Full description

Saved in:

Bibliographic Details
Published in	European journal of operational research Vol. 221; no. 1; pp. 99 - 109
Main Authors	Li, Xueping, Wang, Jiao, Sawhney, Rapinder
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 16.08.2012 Elsevier Elsevier Sequoia S.A
Subjects	Applied sciences Construction Cost analysis Costs Decision making models Decision theory. Utility theory Decisions Demand Exact sciences and technology Heuristic Inventory control, production control. Distribution Inventory management Learning Marketing Markov analysis Markov processes Mathematics Operational research and scientific management Operational research. Management science Policies Pricing Probability and statistics Probability theory and stochastic processes Q-learning Reinforcement Reinforcement learning (RL) Scheduling Scheduling, sequencing Sciences and techniques of general use Semi-Markov Decision Problem (SMDP) Simulation-based optimization Studies Reinforcement learning (RL) Semi-Markov Decision Problem (SMDP) Q-learning Scheduling Simulation-based optimization Pricing Markov decision Ordered set Fixed cost Variable cost Statistical simulation Modeling Delay FIFO system Setup time Inventory control Profit Infinite horizon Learning algorithm Manufacturing cost Discrete event system Make to order Decision making Reinforcement learning Stochastic programming Rejection Heuristic method Capacity constraint Planning
Online Access	Get full text
ISSN	0377-2217 1872-6860
DOI	10.1016/j.ejor.2012.03.020

Cover

More Information
Summary:	► In this study, we investigate the joint pricing, lead-time and scheduling decisions simultaneously in MTO systems. ► We model the problem as a Semi-Markov Decision Problem (SMDP). ► We develop a reinforcement learning (RL) based Q-learning algorithm (QLA). The paper investigates a problem faced by a make-to-order (MTO) firm that has the ability to reject or accept orders, and set prices and lead-times to influence demands. Inventory holding costs for early completed orders, tardiness costs for late delivery orders, order rejection costs, manufacturing variable costs, and fixed costs are considered. In order to maximize the expected profits in an infinite planning horizon with stochastic demands, the firm needs to make decisions from the following aspects: which orders to accept or reject, the trade-off between price and lead-time, and the potential for increased demand against capacity constraints. We model the problem as a Semi-Markov Decision Problem (SMDP) and develop a reinforcement learning (RL) based Q-learning algorithm (QLA) for the problem. In addition, we build a discrete-event simulation model to validate the performance of the QLA, and compare the experimental results with two benchmark policies, the First-Come-First-Serve (FCFS) policy and a threshold heuristic policy. It is shown that the QLA outperforms the existing policies.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23
ISSN:	0377-2217 1872-6860
DOI:	10.1016/j.ejor.2012.03.020