Distributional Constrained Reinforcement Learning for Supply Chain Optimization
This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained...
Saved in:
| Published in | Computer Aided Chemical Engineering Vol. 52; pp. 1649 - 1654 |
|---|---|
| Main Authors | , , |
| Format | Book Chapter |
| Language | English |
| Published |
2023
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 9780443152740 0443152748 |
| ISSN | 1570-7946 |
| DOI | 10.1016/B978-0-443-15274-0.50262-6 |
Cover
| Abstract | This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training. |
|---|---|
| AbstractList | This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also greatly reduces the variance of returns between runs; this result is significant in the context of policy gradient methods, which intrinsically introduce high variance during training. |
| Author | Tsay, Calvin Bermúdez, Jaime Sabal del Rio Chanona, Antonio |
| Author_xml | – sequence: 1 givenname: Jaime Sabal surname: Bermúdez fullname: Bermúdez, Jaime Sabal organization: Department of Computing, Imperial College London, SW7 2AZ, United Kingdom – sequence: 2 givenname: Antonio surname: del Rio Chanona fullname: del Rio Chanona, Antonio organization: Sargent Centre for Process Systems Engineering, Imperial College London, SW7 2AZ, United Kingdom – sequence: 3 givenname: Calvin surname: Tsay fullname: Tsay, Calvin email: c.tsay@imperial.ac.uk organization: Department of Computing, Imperial College London, SW7 2AZ, United Kingdom |
| BookMark | eNotkFtLwzAUxwNOcM59h-B75smlafuonTcoDLw8hzQ50cCWjrYT9NObTZ_O5Xf4H_hdklnqExJyzWHFgeubu7qsGDClJOOFKBWDVQFCC6bPyDIzyOQEYEbmvCiBlbXSF2Q5jrEDXmteVzXMyWYdx2mI3WGKfbJb2vQpzzYm9PQFYwr94HCHaaIt2iHF9EHzir4e9vvtN20-8yXd7Ke4iz_2GHFFzoPdjrj8rwvy_nD_1jyxdvP43Ny2DLnkEwu-U4VXPgihKysRubYouKqDR-FcqLAEAdJX0nbSci496NqqDoNTlYJCLsj6Lxfzk6-IgxldxOTQxwHdZHwfDQdzdGWOrgyYbMSclOT-5Mpo-QtsM2FF |
| ContentType | Book Chapter |
| Copyright | 2023 Elsevier B.V. |
| Copyright_xml | – notice: 2023 Elsevier B.V. |
| DOI | 10.1016/B978-0-443-15274-0.50262-6 |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| EndPage | 1654 |
| ExternalDocumentID | B9780443152740502626 |
| GroupedDBID | AABBV AHTCQ ALMA_UNASSIGNED_HOLDINGS BBABE |
| ID | FETCH-LOGICAL-e131t-fdb45d4df2268a3ee16ae2149fde2ccf8e70203d83ab3a113d069a4befc484053 |
| IEDL.DBID | HGY |
| ISBN | 9780443152740 0443152748 |
| ISSN | 1570-7946 |
| IngestDate | Sat Oct 04 17:01:14 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | Inventory management Process operations Safe reinforcement learning |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-e131t-fdb45d4df2268a3ee16ae2149fde2ccf8e70203d83ab3a113d069a4befc484053 |
| PageCount | 6 |
| ParticipantIDs | elsevier_sciencedirect_doi_10_1016_B978_0_443_15274_0_50262_6 |
| PublicationCentury | 2000 |
| PublicationDate | 2023 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – year: 2023 text: 2023 |
| PublicationDecade | 2020 |
| PublicationTitle | Computer Aided Chemical Engineering |
| PublicationYear | 2023 |
| References | Schulman, Moritz, Levine, Jordan, Abbeel (bb0035) 2015 Hubbs, Perez, Sarwar, Sahinidis, Grossmann, Wassick (bb0020) 2020 Schulman, Levine, Abbeel, Jordan, Moritz (bb0030) 2015 Achiam, Held, Tamar, Abbeel (bb0010) 2017 Petsagkourakis, Sandoval, Bradford, Galvanin, Zhang, del Rio-Chanona (bb0025) 2022; 111 Bellemare, Dabney, Munos (bb0015) 2017 Shin, Badgwell, Liu, Lee (bb0040) 2019; 127 Sootla, Cowen-Rivers, Jafferjee, Wang, Mguni, Wang, Ammar (bb0045) 2022 |
| References_xml | – start-page: 449 year: 2017 end-page: 458 ident: bb0015 article-title: A distributional perspective on reinforcement learning publication-title: International Conference on Machine Learning. PMLR – volume: 111 start-page: 35 year: 2022 end-page: 45 ident: bb0025 article-title: Chance constrained policy optimization for process control and optimization publication-title: Journal of Process Control – start-page: 1889 year: 2015 end-page: 1897 ident: bb0030 article-title: Trust region policy optimization publication-title: International Conference on Machine Learning. PMLR – year: 2015 ident: bb0035 article-title: High-dimensional continuous control using generalized advantage estimation – start-page: 20423 year: 2022 end-page: 20443 ident: bb0045 article-title: Sauté RL: Almost surely safe reinforcement learning using state augmentation publication-title: International Conference on Machine Learning. PMLR – start-page: 22 year: 2017 end-page: 31 ident: bb0010 article-title: Constrained policy optimization publication-title: International Conference on Machine Learning. PMLR – year: 2020 ident: bb0020 article-title: Or-gym: A reinforcement learning library for operations research problems – volume: 127 start-page: 282 year: 2019 end-page: 294 ident: bb0040 article-title: Reinforcement learning–overview of recent progress and implications for process control publication-title: Computers #amp;amp; Chemical Engineering |
| SSID | ssib019619890 ssib051886991 ssib051889539 ssib056837920 ssib045323371 |
| Score | 1.7167301 |
| Snippet | This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on inventory. We introduce... |
| SourceID | elsevier |
| SourceType | Publisher |
| StartPage | 1649 |
| SubjectTerms | Inventory management Process operations Safe reinforcement learning |
| Title | Distributional Constrained Reinforcement Learning for Supply Chain Optimization |
| URI | https://dx.doi.org/10.1016/B978-0-443-15274-0.50262-6 |
| Volume | 52 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ1LT8MwDICjaSfEBQSIt3LgGtY0j7UHLjzGhASTEJPGqUoTF02CgtA48O-xsxS2M7c2VdvUtey4tT8zduYbZX3pclEXqhEaQi7QC-YCSgfknxoXObP3D3Y81XczM-uxUVcLQ2mVyfYvbXq01mlkkKQ5-JjPB5eRnYP-z2BglRmMJXIibyutYn7f7XOnVqhjlBj0q5baqFytUOAISWbLcm2_NH-wTGMxiCsj4FGaYSYIyR5r29O9i47j082lY5tKGycpMqGJA0rHRHYeZyrsivNbcWijLbZJRQ6cqg9QsNusB-0Om1wTRTc1wHKvnJp5xhYSEPgjRMSqj18TeaKyvnAc4rEx6Ddda97yCZqgt1Tbucumo5unq7FIDRcESCUXogm1NkGHBtdkhVMA0jrIMYZqAuTeNwUM6cdlKJSrlZNShcyWTtfQeI2BolF7rN--t7DPeOG1Nb42dghSh0zWALRWKlEbgIYO2EX3-NXa-67QlFdd6hmJr8oqFF8VxYfbUXyVPfzn-UdsgxrHLz-mHLP-4vMLTnB5sahPo_L8AIqEvmk |
| linkProvider | Elsevier |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELZKGUAsIEC8ycBqGsePJgMLjxKgDwm1EkyRE19QJQgIlYF_z52bQDuzJRclcS6ne9i-72PsrCilKRIb8TyWJVfgIo5RMOKQWKD4VFqPMzsYmnSi7p_0U4v1ml4Y2lZZ-_65T_feupZ0am12PqbTzqXHzsH4p7GwCjXWEpFZYavUpUtMBuntc2NXaGS0M-jXLpWWkVyAgSNMMpMkS-eJ_kPL1AaruMQjPArdDTlhsvvm9vrlcQPk0wymATcVxo-Sh1wREChd4-G5Hyo3C9FvIaL1NtkGdTkE1H6Amt1iLai22eiaYHRrBiz7GhCbp-eQABc8gsdYLfx0YlDDsr4EKAo8M-g3PWtaBSP0QW91c-cOm_RuxlcprxkXOAgpZrx0udJOuRKTsthKAGEsRFhElQ6ioihj6NLKpYulzaUVQrrQJFblUBYKK0Utd1m7eq9gjwVxoYwucm26IJQLRQ5AyVKC5gAk2mcXzednSz88Q1-eNXvPSH1ZmKH6Mq8-PPbqy8zBP-8_ZWvpeNDP-nfDh0O2Tizy85mVI9aefX7BMeYas_zEG9IPNbDBhw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Aided+Chemical+Engineering&rft.au=Berm%C3%BAdez%2C+Jaime+Sabal&rft.au=del+Rio+Chanona%2C+Antonio&rft.au=Tsay%2C+Calvin&rft.atitle=Distributional+Constrained+Reinforcement+Learning+for+Supply+Chain+Optimization&rft.date=2023-01-01&rft.isbn=9780443152740&rft.issn=1570-7946&rft.volume=52&rft.spage=1649&rft.epage=1654&rft_id=info:doi/10.1016%2FB978-0-443-15274-0.50262-6&rft.externalDocID=B9780443152740502626 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1570-7946&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1570-7946&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1570-7946&client=summon |