Distributional Constrained Reinforcement Learning for Supply Chain Optimization

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained...

Full description

Saved in:
Bibliographic Details
Published inComputer Aided Chemical Engineering Vol. 52; pp. 1649 - 1654
Main Authors Bermúdez, Jaime Sabal, del Rio Chanona, Antonio, Tsay, Calvin
Format Book Chapter
LanguageEnglish
Published 2023
Subjects
Online AccessGet full text
ISBN9780443152740
0443152748
ISSN1570-7946
DOI10.1016/B978-0-443-15274-0.50262-6

Cover

Abstract This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training.
AbstractList This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training.
Author Tsay, Calvin
Bermúdez, Jaime Sabal
del Rio Chanona, Antonio
Author_xml – sequence: 1
  givenname: Jaime Sabal
  surname: Bermúdez
  fullname: Bermúdez, Jaime Sabal
  organization: Department of Computing, Imperial College London, SW7 2AZ, United Kingdom
– sequence: 2
  givenname: Antonio
  surname: del Rio Chanona
  fullname: del Rio Chanona, Antonio
  organization: Sargent Centre for Process Systems Engineering, Imperial College London, SW7 2AZ, United Kingdom
– sequence: 3
  givenname: Calvin
  surname: Tsay
  fullname: Tsay, Calvin
  email: c.tsay@imperial.ac.uk
  organization: Department of Computing, Imperial College London, SW7 2AZ, United Kingdom
BookMark eNotkFtLwzAUxwNOcM59h-B75smlafuonTcoDLw8hzQ50cCWjrYT9NObTZ_O5Xf4H_hdklnqExJyzWHFgeubu7qsGDClJOOFKBWDVQFCC6bPyDIzyOQEYEbmvCiBlbXSF2Q5jrEDXmteVzXMyWYdx2mI3WGKfbJb2vQpzzYm9PQFYwr94HCHaaIt2iHF9EHzir4e9vvtN20-8yXd7Ke4iz_2GHFFzoPdjrj8rwvy_nD_1jyxdvP43Ny2DLnkEwu-U4VXPgihKysRubYouKqDR-FcqLAEAdJX0nbSci496NqqDoNTlYJCLsj6Lxfzk6-IgxldxOTQxwHdZHwfDQdzdGWOrgyYbMSclOT-5Mpo-QtsM2FF
ContentType Book Chapter
Copyright 2023 Elsevier B.V.
Copyright_xml – notice: 2023 Elsevier B.V.
DOI 10.1016/B978-0-443-15274-0.50262-6
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
EndPage 1654
ExternalDocumentID B9780443152740502626
GroupedDBID AABBV
AHTCQ
ALMA_UNASSIGNED_HOLDINGS
BBABE
ID FETCH-LOGICAL-e131t-fdb45d4df2268a3ee16ae2149fde2ccf8e70203d83ab3a113d069a4befc484053
IEDL.DBID HGY
ISBN 9780443152740
0443152748
ISSN 1570-7946
IngestDate Sat Oct 04 17:01:14 EDT 2025
IsPeerReviewed false
IsScholarly false
Keywords Inventory management
Process operations
Safe reinforcement learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-e131t-fdb45d4df2268a3ee16ae2149fde2ccf8e70203d83ab3a113d069a4befc484053
PageCount 6
ParticipantIDs elsevier_sciencedirect_doi_10_1016_B978_0_443_15274_0_50262_6
PublicationCentury 2000
PublicationDate 2023
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – year: 2023
  text: 2023
PublicationDecade 2020
PublicationTitle Computer Aided Chemical Engineering
PublicationYear 2023
References Schulman, Moritz, Levine, Jordan, Abbeel (bb0035) 2015
Hubbs, Perez, Sarwar, Sahinidis, Grossmann, Wassick (bb0020) 2020
Schulman, Levine, Abbeel, Jordan, Moritz (bb0030) 2015
Achiam, Held, Tamar, Abbeel (bb0010) 2017
Petsagkourakis, Sandoval, Bradford, Galvanin, Zhang, del Rio-Chanona (bb0025) 2022; 111
Bellemare, Dabney, Munos (bb0015) 2017
Shin, Badgwell, Liu, Lee (bb0040) 2019; 127
Sootla, Cowen-Rivers, Jafferjee, Wang, Mguni, Wang, Ammar (bb0045) 2022
References_xml – start-page: 449
  year: 2017
  end-page: 458
  ident: bb0015
  article-title: A distributional perspective on reinforcement learning
  publication-title: International Conference on Machine Learning. PMLR
– volume: 111
  start-page: 35
  year: 2022
  end-page: 45
  ident: bb0025
  article-title: Chance constrained policy optimization for process control and optimization
  publication-title: Journal of Process Control
– start-page: 1889
  year: 2015
  end-page: 1897
  ident: bb0030
  article-title: Trust region policy optimization
  publication-title: International Conference on Machine Learning. PMLR
– year: 2015
  ident: bb0035
  article-title: High-dimensional continuous control using generalized advantage estimation
– start-page: 20423
  year: 2022
  end-page: 20443
  ident: bb0045
  article-title: Sauté RL: Almost surely safe reinforcement learning using state augmentation
  publication-title: International Conference on Machine Learning. PMLR
– start-page: 22
  year: 2017
  end-page: 31
  ident: bb0010
  article-title: Constrained policy optimization
  publication-title: International Conference on Machine Learning. PMLR
– year: 2020
  ident: bb0020
  article-title: Or-gym: A reinforcement learning library for operations research problems
– volume: 127
  start-page: 282
  year: 2019
  end-page: 294
  ident: bb0040
  article-title: Reinforcement learning–overview of recent progress and implications for process control
  publication-title: Computers #amp;amp; Chemical Engineering
SSID ssib019619890
ssib051886991
ssib051889539
ssib056837920
ssib045323371
Score 1.7167301
Snippet This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce...
SourceID elsevier
SourceType Publisher
StartPage 1649
SubjectTerms Inventory management
Process operations
Safe reinforcement learning
Title Distributional Constrained Reinforcement Learning for Supply Chain Optimization
URI https://dx.doi.org/10.1016/B978-0-443-15274-0.50262-6
Volume 52
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ1LT8MwDICjaSfEBQSIt3LgGtY0j7UHLjzGhASTEJPGqUoTF02CgtA48O-xsxS2M7c2VdvUtey4tT8zduYbZX3pclEXqhEaQi7QC-YCSgfknxoXObP3D3Y81XczM-uxUVcLQ2mVyfYvbXq01mlkkKQ5-JjPB5eRnYP-z2BglRmMJXIibyutYn7f7XOnVqhjlBj0q5baqFytUOAISWbLcm2_NH-wTGMxiCsj4FGaYSYIyR5r29O9i47j082lY5tKGycpMqGJA0rHRHYeZyrsivNbcWijLbZJRQ6cqg9QsNusB-0Om1wTRTc1wHKvnJp5xhYSEPgjRMSqj18TeaKyvnAc4rEx6Ddda97yCZqgt1Tbucumo5unq7FIDRcESCUXogm1NkGHBtdkhVMA0jrIMYZqAuTeNwUM6cdlKJSrlZNShcyWTtfQeI2BolF7rN--t7DPeOG1Nb42dghSh0zWALRWKlEbgIYO2EX3-NXa-67QlFdd6hmJr8oqFF8VxYfbUXyVPfzn-UdsgxrHLz-mHLP-4vMLTnB5sahPo_L8AIqEvmk
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELZKGUAsIEC8ycBqGsePJgMLjxKgDwm1EkyRE19QJQgIlYF_z52bQDuzJRclcS6ne9i-72PsrCilKRIb8TyWJVfgIo5RMOKQWKD4VFqPMzsYmnSi7p_0U4v1ml4Y2lZZ-_65T_feupZ0am12PqbTzqXHzsH4p7GwCjXWEpFZYavUpUtMBuntc2NXaGS0M-jXLpWWkVyAgSNMMpMkS-eJ_kPL1AaruMQjPArdDTlhsvvm9vrlcQPk0wymATcVxo-Sh1wREChd4-G5Hyo3C9FvIaL1NtkGdTkE1H6Amt1iLai22eiaYHRrBiz7GhCbp-eQABc8gsdYLfx0YlDDsr4EKAo8M-g3PWtaBSP0QW91c-cOm_RuxlcprxkXOAgpZrx0udJOuRKTsthKAGEsRFhElQ6ioihj6NLKpYulzaUVQrrQJFblUBYKK0Utd1m7eq9gjwVxoYwucm26IJQLRQ5AyVKC5gAk2mcXzednSz88Q1-eNXvPSH1ZmKH6Mq8-PPbqy8zBP-8_ZWvpeNDP-nfDh0O2Tizy85mVI9aefX7BMeYas_zEG9IPNbDBhw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Aided+Chemical+Engineering&rft.au=Berm%C3%BAdez%2C+Jaime+Sabal&rft.au=del+Rio+Chanona%2C+Antonio&rft.au=Tsay%2C+Calvin&rft.atitle=Distributional+Constrained+Reinforcement+Learning+for+Supply+Chain+Optimization&rft.date=2023-01-01&rft.isbn=9780443152740&rft.issn=1570-7946&rft.volume=52&rft.spage=1649&rft.epage=1654&rft_id=info:doi/10.1016%2FB978-0-443-15274-0.50262-6&rft.externalDocID=B9780443152740502626
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1570-7946&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1570-7946&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1570-7946&client=summon