Distributional Constrained Reinforcement Learning for Supply Chain Optimization

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained...

Full description

Saved in:

Bibliographic Details
Published in	Computer Aided Chemical Engineering Vol. 52; pp. 1649 - 1654
Main Authors	Bermúdez, Jaime Sabal, del Rio Chanona, Antonio, Tsay, Calvin
Format	Book Chapter
Language	English
Published	2023
Subjects	Inventory management Process operations Safe reinforcement learning Inventory management Process operations Safe reinforcement learning
Online Access	Get full text
ISBN	9780443152740 0443152748
ISSN	1570-7946
DOI	10.1016/B978-0-443-15274-0.50262-6

Cover

Abstract	This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training.
AbstractList	This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training.
Author	Tsay, Calvin Bermúdez, Jaime Sabal del Rio Chanona, Antonio
Author_xml	– sequence: 1 givenname: Jaime Sabal surname: Bermúdez fullname: Bermúdez, Jaime Sabal organization: Department of Computing, Imperial College London, SW7 2AZ, United Kingdom – sequence: 2 givenname: Antonio surname: del Rio Chanona fullname: del Rio Chanona, Antonio organization: Sargent Centre for Process Systems Engineering, Imperial College London, SW7 2AZ, United Kingdom – sequence: 3 givenname: Calvin surname: Tsay fullname: Tsay, Calvin email: c.tsay@imperial.ac.uk organization: Department of Computing, Imperial College London, SW7 2AZ, United Kingdom
BookMark	eNotkFtLwzAUxwNOcM59h-B75smlafuonTcoDLw8hzQ50cCWjrYT9NObTZ_O5Xf4H_hdklnqExJyzWHFgeubu7qsGDClJOOFKBWDVQFCC6bPyDIzyOQEYEbmvCiBlbXSF2Q5jrEDXmteVzXMyWYdx2mI3WGKfbJb2vQpzzYm9PQFYwr94HCHaaIt2iHF9EHzir4e9vvtN20-8yXd7Ke4iz_2GHFFzoPdjrj8rwvy_nD_1jyxdvP43Ny2DLnkEwu-U4VXPgihKysRubYouKqDR-FcqLAEAdJX0nbSci496NqqDoNTlYJCLsj6Lxfzk6-IgxldxOTQxwHdZHwfDQdzdGWOrgyYbMSclOT-5Mpo-QtsM2FF
ContentType	Book Chapter
Copyright	2023 Elsevier B.V.
Copyright_xml	– notice: 2023 Elsevier B.V.
DOI	10.1016/B978-0-443-15274-0.50262-6
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
EndPage	1654
ExternalDocumentID	B9780443152740502626
GroupedDBID	AABBV AHTCQ ALMA_UNASSIGNED_HOLDINGS BBABE
ID	FETCH-LOGICAL-e131t-fdb45d4df2268a3ee16ae2149fde2ccf8e70203d83ab3a113d069a4befc484053
IEDL.DBID	HGY
ISBN	9780443152740 0443152748
ISSN	1570-7946
IngestDate	Sat Oct 04 17:01:14 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Keywords	Inventory management Process operations Safe reinforcement learning
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-e131t-fdb45d4df2268a3ee16ae2149fde2ccf8e70203d83ab3a113d069a4befc484053
PageCount	6
ParticipantIDs	elsevier_sciencedirect_doi_10_1016_B978_0_443_15274_0_50262_6
PublicationCentury	2000
PublicationDate	2023
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– year: 2023 text: 2023
PublicationDecade	2020
PublicationTitle	Computer Aided Chemical Engineering
PublicationYear	2023
References	Schulman, Moritz, Levine, Jordan, Abbeel (bb0035) 2015 Hubbs, Perez, Sarwar, Sahinidis, Grossmann, Wassick (bb0020) 2020 Schulman, Levine, Abbeel, Jordan, Moritz (bb0030) 2015 Achiam, Held, Tamar, Abbeel (bb0010) 2017 Petsagkourakis, Sandoval, Bradford, Galvanin, Zhang, del Rio-Chanona (bb0025) 2022; 111 Bellemare, Dabney, Munos (bb0015) 2017 Shin, Badgwell, Liu, Lee (bb0040) 2019; 127 Sootla, Cowen-Rivers, Jafferjee, Wang, Mguni, Wang, Ammar (bb0045) 2022
References_xml	– start-page: 449 year: 2017 end-page: 458 ident: bb0015 article-title: A distributional perspective on reinforcement learning publication-title: International Conference on Machine Learning. PMLR – volume: 111 start-page: 35 year: 2022 end-page: 45 ident: bb0025 article-title: Chance constrained policy optimization for process control and optimization publication-title: Journal of Process Control – start-page: 1889 year: 2015 end-page: 1897 ident: bb0030 article-title: Trust region policy optimization publication-title: International Conference on Machine Learning. PMLR – year: 2015 ident: bb0035 article-title: High-dimensional continuous control using generalized advantage estimation – start-page: 20423 year: 2022 end-page: 20443 ident: bb0045 article-title: Sauté RL: Almost surely safe reinforcement learning using state augmentation publication-title: International Conference on Machine Learning. PMLR – start-page: 22 year: 2017 end-page: 31 ident: bb0010 article-title: Constrained policy optimization publication-title: International Conference on Machine Learning. PMLR – year: 2020 ident: bb0020 article-title: Or-gym: A reinforcement learning library for operations research problems – volume: 127 start-page: 282 year: 2019 end-page: 294 ident: bb0040 article-title: Reinforcement learning–overview of recent progress and implications for process control publication-title: Computers #amp;amp; Chemical Engineering
SSID	ssib019619890 ssib051886991 ssib051889539 ssib056837920 ssib045323371
Score	1.7167301
Snippet	This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce...
SourceID	elsevier
SourceType	Publisher
StartPage	1649
SubjectTerms	Inventory management Process operations Safe reinforcement learning
Title	Distributional Constrained Reinforcement Learning for Supply Chain Optimization
URI	https://dx.doi.org/10.1016/B978-0-443-15274-0.50262-6
Volume	52
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ1LT8MwDICjaSfEBQSIt3LgGtY0j7UHLjzGhASTEJPGqUoTF02CgtA48O-xsxS2M7c2VdvUtey4tT8zduYbZX3pclEXqhEaQi7QC-YCSgfknxoXObP3D3Y81XczM-uxUVcLQ2mVyfYvbXq01mlkkKQ5-JjPB5eRnYP-z2BglRmMJXIibyutYn7f7XOnVqhjlBj0q5baqFytUOAISWbLcm2_NH-wTGMxiCsj4FGaYSYIyR5r29O9i47j082lY5tKGycpMqGJA0rHRHYeZyrsivNbcWijLbZJRQ6cqg9QsNusB-0Om1wTRTc1wHKvnJp5xhYSEPgjRMSqj18TeaKyvnAc4rEx6Ddda97yCZqgt1Tbucumo5unq7FIDRcESCUXogm1NkGHBtdkhVMA0jrIMYZqAuTeNwUM6cdlKJSrlZNShcyWTtfQeI2BolF7rN--t7DPeOG1Nb42dghSh0zWALRWKlEbgIYO2EX3-NXa-67QlFdd6hmJr8oqFF8VxYfbUXyVPfzn-UdsgxrHLz-mHLP-4vMLTnB5sahPo_L8AIqEvmk
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELZKGUAsIEC8ycBqGsePJgMLjxKgDwm1EkyRE19QJQgIlYF_z52bQDuzJRclcS6ne9i-72PsrCilKRIb8TyWJVfgIo5RMOKQWKD4VFqPMzsYmnSi7p_0U4v1ml4Y2lZZ-_65T_feupZ0am12PqbTzqXHzsH4p7GwCjXWEpFZYavUpUtMBuntc2NXaGS0M-jXLpWWkVyAgSNMMpMkS-eJ_kPL1AaruMQjPArdDTlhsvvm9vrlcQPk0wymATcVxo-Sh1wREChd4-G5Hyo3C9FvIaL1NtkGdTkE1H6Amt1iLai22eiaYHRrBiz7GhCbp-eQABc8gsdYLfx0YlDDsr4EKAo8M-g3PWtaBSP0QW91c-cOm_RuxlcprxkXOAgpZrx0udJOuRKTsthKAGEsRFhElQ6ioihj6NLKpYulzaUVQrrQJFblUBYKK0Utd1m7eq9gjwVxoYwucm26IJQLRQ5AyVKC5gAk2mcXzednSz88Q1-eNXvPSH1ZmKH6Mq8-PPbqy8zBP-8_ZWvpeNDP-nfDh0O2Tizy85mVI9aefX7BMeYas_zEG9IPNbDBhw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Aided+Chemical+Engineering&rft.au=Berm%C3%BAdez%2C+Jaime+Sabal&rft.au=del+Rio+Chanona%2C+Antonio&rft.au=Tsay%2C+Calvin&rft.atitle=Distributional+Constrained+Reinforcement+Learning+for+Supply+Chain+Optimization&rft.date=2023-01-01&rft.isbn=9780443152740&rft.issn=1570-7946&rft.volume=52&rft.spage=1649&rft.epage=1654&rft_id=info:doi/10.1016%2FB978-0-443-15274-0.50262-6&rft.externalDocID=B9780443152740502626
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1570-7946&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1570-7946&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1570-7946&client=summon