Distributional Constrained Reinforcement Learning for Supply Chain Optimization

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained...

Full description

Saved in:

Bibliographic Details
Published in	Computer Aided Chemical Engineering Vol. 52; pp. 1649 - 1654
Main Authors	Bermúdez, Jaime Sabal, del Rio Chanona, Antonio, Tsay, Calvin
Format	Book Chapter
Language	English
Published	2023
Subjects	Inventory management Process operations Safe reinforcement learning Inventory management Process operations Safe reinforcement learning
Online Access	Get full text
ISBN	9780443152740 0443152748
ISSN	1570-7946
DOI	10.1016/B978-0-443-15274-0.50262-6

Cover

More Information
Summary:	This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training.
ISBN:	9780443152740 0443152748
ISSN:	1570-7946
DOI:	10.1016/B978-0-443-15274-0.50262-6