Recurrent Reinforcement Learning Strategy with a Parameterized Agent for Online Scheduling of a State Task Network Under Uncertainty

This study presents a framework for developing reinforcement learning hybrid agents that can build online schedules for state task networks under epistemic and aleatoric uncertainty. The hybrid agent can perform multiple discrete or continuous decisions at every time interval. To approach the uncert...

Full description

Saved in:
Bibliographic Details
Published inIndustrial & engineering chemistry research Vol. 64; no. 13; pp. 7126 - 7140
Main Authors Rangel-Martinez, Daniel, Ricardez-Sandoval, Luis A.
Format Journal Article
LanguageEnglish
Published American Chemical Society 02.04.2025
Subjects
Online AccessGet full text
ISSN0888-5885
1520-5045
1520-5045
DOI10.1021/acs.iecr.4c04900

Cover

Abstract This study presents a framework for developing reinforcement learning hybrid agents that can build online schedules for state task networks under epistemic and aleatoric uncertainty. The hybrid agent can perform multiple discrete or continuous decisions at every time interval. To approach the uncertainty in the scheduling process, the hybrid agent is augmented with a set of LSTM layers that integrate a sequence of observations. This feature allows for the consideration of previous information to make decisions in view of the realization and propagation of uncertainties throughout the plant. Moreover, the techniques required for an efficient training oriented toward the objective function are described. The method is implemented in two case studies for validation and testing of the agent subject to epistemic and aleatoric uncertainty. A similar hybrid agent without recurrence is used as a benchmark. The proposed hybrid agent accumulated larger rewards while minimizing the number of constraint violations in the process under uncertainty, thus, making this online scheduling agent attractive for industrial-scale applications.
AbstractList This study presents a framework for developing reinforcement learning hybrid agents that can build online schedules for state task networks under epistemic and aleatoric uncertainty. The hybrid agent can perform multiple discrete or continuous decisions at every time interval. To approach the uncertainty in the scheduling process, the hybrid agent is augmented with a set of LSTM layers that integrate a sequence of observations. This feature allows for the consideration of previous information to make decisions in view of the realization and propagation of uncertainties throughout the plant. Moreover, the techniques required for an efficient training oriented toward the objective function are described. The method is implemented in two case studies for validation and testing of the agent subject to epistemic and aleatoric uncertainty. A similar hybrid agent without recurrence is used as a benchmark. The proposed hybrid agent accumulated larger rewards while minimizing the number of constraint violations in the process under uncertainty, thus, making this online scheduling agent attractive for industrial-scale applications.
Author Ricardez-Sandoval, Luis A.
Rangel-Martinez, Daniel
Author_xml – sequence: 1
  givenname: Daniel
  surname: Rangel-Martinez
  fullname: Rangel-Martinez, Daniel
– sequence: 2
  givenname: Luis A.
  orcidid: 0000-0001-9867-6778
  surname: Ricardez-Sandoval
  fullname: Ricardez-Sandoval, Luis A.
  email: laricard@uwaterloo.ca
BookMark eNp1kEFv2zAMRoWhBZa2u--o4w5zRtliIh-LYlsHBOvQZGdDlqlUrSN3lIwiO--H10Z63YUEwfcI8LsQZ3GIJMRHBUsFpfpiXVoGcrzUDnQN8E4sFJZQIGg8EwswxhRoDL4XFyk9AgCi1gvx757cyEwxy3sK0Q_s6DBPG7IcQ9zLbWabaX-ULyE_SCt_WbYHysThL3Xyej_DkybvYh8iya17oG7sZ3PwE77Nky13Nj3Jn5RfBn6Sv2NHPFVHnG2I-Xglzr3tE31465di9-3r7ua22Nx9_3FzvSlsuTK5wLZ2LXrVIVBtzBrrVq-9tqomvVJVpbuyahFRQeva0jpEX3uvKr1aY9dRdSk-nc4-8_BnpJSbQ0iO-t5GGsbUVCVAaUDVZkLhhDoeUmLyzTOHg-Vjo6CZ826mvJs57-Yt70n5fFLmzeMwcpxe-T_-Cgiah-s
Cites_doi 10.1016/j.dche.2022.100023
10.1016/j.compchemeng.2024.108924
10.1021/ie071431u
10.1016/j.cherd.2021.10.032
10.1007/s10479-021-04141-w
10.1016/j.compchemeng.2007.03.001
10.1016/0098-1354(93)80015-F
10.1080/14697688.2022.2135456
10.1021/ie000683r
10.1002/aic.16489
10.1016/j.procir.2018.03.212
10.1109/ACROSET62108.2024.10743847
10.1021/ie902009k
10.1016/j.compchemeng.2024.108783
10.1016/j.compchemeng.2020.106982
10.1021/ie950082d
10.1007/s10845-023-02094-4
10.32473/flairs.v35i.130584
10.1109/ASMC.2018.8373191
10.1016/j.ejor.2023.07.037
10.1016/S0377-2217(99)00486-5
10.1016/S0098-1354(97)00234-2
10.1016/S0927-0507(05)80172-0
10.1007/s11740-020-00967-8
10.1007/s10994-021-05946-3
10.1002/sam.11709
10.1016/S0377-2217(99)00311-2
10.1016/B978-0-443-15274-0.50263-8
10.1109/TII.2023.3272661
10.1016/S0098-1354(02)00221-1
10.1016/j.compchemeng.2024.108748
10.1016/j.physd.2022.133454
10.1021/ie401044q
10.1016/j.engappai.2023.107790
10.1002/aic.18584
10.1080/09511929208944524
ContentType Journal Article
Copyright 2025 American Chemical Society
Copyright_xml – notice: 2025 American Chemical Society
DBID AAYXX
CITATION
7S9
L.6
DOI 10.1021/acs.iecr.4c04900
DatabaseName CrossRef
AGRICOLA
AGRICOLA - Academic
DatabaseTitle CrossRef
AGRICOLA
AGRICOLA - Academic
DatabaseTitleList
AGRICOLA
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Chemistry
EISSN 1520-5045
EndPage 7140
ExternalDocumentID 10_1021_acs_iecr_4c04900
a130066387
GroupedDBID -~X
.DC
.K2
4.4
55A
5GY
5VS
6TJ
7~N
AABXI
ABMVS
ABQRX
ABUCX
ACGFO
ACJ
ACS
ADHLV
AEESW
AENEX
AFEFF
AGXLV
AHGAQ
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
BAANH
CS3
CUPRZ
DU5
EBS
ED~
F5P
GGK
GNL
IH9
JG~
LG6
P2P
ROL
TAE
TN5
UI2
VF5
VG9
W1F
WH7
~02
AAYXX
ABBLG
ABLBI
CITATION
7S9
L.6
ID FETCH-LOGICAL-a268t-5b9cb5f1d50e988759b47f4a19e461334d23b55510bcb2ac55f9ff134675dde3
IEDL.DBID ACS
ISSN 0888-5885
1520-5045
IngestDate Wed Jul 02 03:18:44 EDT 2025
Wed Oct 01 06:35:31 EDT 2025
Thu Apr 03 03:22:56 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 13
Language English
License https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
https://doi.org/10.15223/policy-045
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a268t-5b9cb5f1d50e988759b47f4a19e461334d23b55510bcb2ac55f9ff134675dde3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-9867-6778
PQID 3200280198
PQPubID 24069
PageCount 15
ParticipantIDs proquest_miscellaneous_3200280198
crossref_primary_10_1021_acs_iecr_4c04900
acs_journals_10_1021_acs_iecr_4c04900
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-04-02
PublicationDateYYYYMMDD 2025-04-02
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-04-02
  day: 02
PublicationDecade 2020
PublicationTitle Industrial & engineering chemistry research
PublicationTitleAlternate Ind. Eng. Chem. Res
PublicationYear 2025
Publisher American Chemical Society
Publisher_xml – name: American Chemical Society
References ref9/cit9
ref45/cit45
ref3/cit3
ref27/cit27
ref16/cit16
ref52/cit52
Sutton R. S. (ref31/cit31) 2018
ref23/cit23
ref8/cit8
ref2/cit2
ref34/cit34
ref37/cit37
ref20/cit20
ref48/cit48
ref17/cit17
ref10/cit10
ref35/cit35
Géron A. (ref43/cit43) 2017
ref19/cit19
ref21/cit21
ref42/cit42
ref46/cit46
ref49/cit49
ref13/cit13
ref24/cit24
ref38/cit38
ref50/cit50
ref6/cit6
ref36/cit36
ref18/cit18
ref11/cit11
ref25/cit25
ref29/cit29
ref32/cit32
ref39/cit39
ref14/cit14
ref5/cit5
ref51/cit51
ref28/cit28
ref40/cit40
ref26/cit26
ref12/cit12
ref15/cit15
ref41/cit41
ref22/cit22
ref33/cit33
ref4/cit4
ref30/cit30
ref47/cit47
ref1/cit1
ref44/cit44
ref7/cit7
References_xml – ident: ref9/cit9
  doi: 10.1016/j.dche.2022.100023
– ident: ref11/cit11
  doi: 10.1016/j.compchemeng.2024.108924
– ident: ref28/cit28
  doi: 10.1021/ie071431u
– ident: ref24/cit24
  doi: 10.1016/j.cherd.2021.10.032
– ident: ref30/cit30
  doi: 10.1007/s10479-021-04141-w
– ident: ref44/cit44
– ident: ref33/cit33
– ident: ref5/cit5
  doi: 10.1016/j.compchemeng.2007.03.001
– ident: ref27/cit27
  doi: 10.1016/0098-1354(93)80015-F
– ident: ref13/cit13
  doi: 10.1080/14697688.2022.2135456
– ident: ref37/cit37
  doi: 10.1021/ie000683r
– ident: ref40/cit40
– ident: ref7/cit7
  doi: 10.1002/aic.16489
– ident: ref16/cit16
  doi: 10.1016/j.procir.2018.03.212
– ident: ref50/cit50
  doi: 10.1109/ACROSET62108.2024.10743847
– ident: ref2/cit2
  doi: 10.1021/ie902009k
– ident: ref23/cit23
  doi: 10.1016/j.compchemeng.2024.108783
– ident: ref15/cit15
  doi: 10.1016/j.compchemeng.2020.106982
– ident: ref51/cit51
  doi: 10.1021/ie950082d
– ident: ref19/cit19
  doi: 10.1007/s10845-023-02094-4
– ident: ref47/cit47
  doi: 10.32473/flairs.v35i.130584
– ident: ref14/cit14
  doi: 10.1016/j.compchemeng.2020.106982
– ident: ref18/cit18
  doi: 10.1109/ASMC.2018.8373191
– ident: ref42/cit42
– ident: ref20/cit20
  doi: 10.1016/j.ejor.2023.07.037
– ident: ref38/cit38
  doi: 10.1016/S0377-2217(99)00486-5
– ident: ref29/cit29
  doi: 10.1016/S0098-1354(97)00234-2
– ident: ref39/cit39
  doi: 10.1016/S0927-0507(05)80172-0
– ident: ref17/cit17
  doi: 10.1007/s11740-020-00967-8
– ident: ref1/cit1
  doi: 10.1007/s10994-021-05946-3
– ident: ref21/cit21
– ident: ref49/cit49
  doi: 10.1002/sam.11709
– ident: ref4/cit4
  doi: 10.1016/S0377-2217(99)00311-2
– ident: ref32/cit32
– ident: ref10/cit10
– ident: ref35/cit35
– ident: ref45/cit45
– ident: ref46/cit46
– ident: ref34/cit34
– ident: ref12/cit12
  doi: 10.1016/B978-0-443-15274-0.50263-8
– ident: ref25/cit25
  doi: 10.1109/TII.2023.3272661
– ident: ref6/cit6
  doi: 10.1016/S0098-1354(02)00221-1
– ident: ref26/cit26
  doi: 10.1016/j.compchemeng.2024.108748
– ident: ref48/cit48
  doi: 10.1016/j.physd.2022.133454
– volume-title: Reinforcement Learning: An Introduction
  year: 2018
  ident: ref31/cit31
– ident: ref36/cit36
  doi: 10.1021/ie401044q
– ident: ref52/cit52
– ident: ref41/cit41
– volume-title: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
  year: 2017
  ident: ref43/cit43
– ident: ref22/cit22
  doi: 10.1016/j.engappai.2023.107790
– ident: ref8/cit8
  doi: 10.1002/aic.18584
– ident: ref3/cit3
  doi: 10.1080/09511929208944524
SSID ssj0005544
Score 2.4783015
Snippet This study presents a framework for developing reinforcement learning hybrid agents that can build online schedules for state task networks under epistemic and...
SourceID proquest
crossref
acs
SourceType Aggregation Database
Index Database
Publisher
StartPage 7126
SubjectTerms chemistry
hybrids
Process Systems Engineering
uncertainty
Title Recurrent Reinforcement Learning Strategy with a Parameterized Agent for Online Scheduling of a State Task Network Under Uncertainty
URI http://dx.doi.org/10.1021/acs.iecr.4c04900
https://www.proquest.com/docview/3200280198
Volume 64
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVABC
  databaseName: American Chemical Society Journals
  customDbUrl:
  eissn: 1520-5045
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005544
  issn: 0888-5885
  databaseCode: ACS
  dateStart: 19870101
  isFulltext: true
  titleUrlDefault: https://pubs.acs.org/action/showPublications?display=journals
  providerName: American Chemical Society
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELYQLDDwRrxlJBgYUmLHTuqxqkAVQ4XaInWLbMepEFKLSDrQmR_OnZOqFBDq4iFyHvLj7svd5-8IuQ6V0wDEk4DpKA6EZVEAbkEENgOsjfg3yjzLtxt3nsXjUA4XMjk_M_ic3WlbNF4AQjWExSwV_J5v8DhJkL7XavcXdA7pC7fCpsGTRE1ZpyT_egI6IlssO6JlO-ydy8NOVaWo8JqEyCl5bUxL07Cz34qNK3z3LtmuMSZtVYtij6y58T7Z-qY8eEA-exhnR2Um2nNePNX6OCGt9VZHtJat_aAYqaWaPmmkcaGy88xltIUnsijcRiutUtqHyc-Q1T6ikxy6exBLB7p4pd2KaU59iSVobcVCKD8OyeDhftDuBHVBhkDzuFkG0ihrZM4yGToF1kkqI5JcaKacAFgQiYxHRgIGC401XFspc5XnLAJjLMGMRkdkfTwZu2NCjQqNymymjRKCcw1tHppESNO0TCT8hNzA-KX1fipSnyrnLMWLOKhpPagn5HY-ielbJc_xT9-r-SynsIcwMaLHbjIt0oj7DDNTzdMV33tGNjkWAkYKDz8n6-X71F0AOinNpV-WX47J4Js
linkProvider American Chemical Society
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JSwMxFA6iB_XgLtY1gh48TJ1kkrY5lqLUrUhbwduQZDJFCq0404Oe_eG-l5m6IaKXHEImE7K8fMn78j1CjkLlNADxesB0VAuEZVEA24IIbAJYG_FvlHiWb6fWvhOX9_J-hrDpWxhoRAY1Zd6J_6EuwE4x7wGQVFVYdFbBKX1O1gTD81az1ftgdUgfvxXWDj4oasjSM_lTDbgf2ezrfvTVHPs95nyZdN9b56klw-okN1X78k248V_NXyFLJeKkzWKKrJIZN1oji590CNfJaxdv3VGniXadl1K1_taQluqrA1qK2D5TvLelmt5qJHWhzvOLS2gT32dR-IwWyqW0B1MhQY77gI5TKO4hLe3rbEg7Be-c-oBLkNqCk5A_b5D--Vm_1Q7K8AyB5rVGHkijrJEpS2ToFNgqqYyop0Iz5QSAhEgkPDISEFlorOHaSpmqNGURmGYJRjXaJLOj8chtEWpUaFRiE22UEJxrSNPQ1IU0DctEnVfIMfRfXK6uLPaOc85izMROjctOrZCT6VjGj4VYxy9lD6eDHcOKQjeJHrnxJIsj7v3NTDW2__jfAzLf7t9cx9cXnasdssAxRDCSe_gumc2fJm4PcEtu9v1MfQP3B-j9
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8QwFA6iIHpwF3cj6MFDxyZNZiZHUQc3BtFRvJVsFRFmxHYOM2d_uO-lHTdE9JJDSNM0yUu-5n35HiG7sfIagHgjYjqpR8KyJIJtQUTWAdZG_Ju4wPJt109vxfm9vB8jcnQXBhqRQ015cOKjVT-7rFIYYAeY_whoqiYsOqzgT31C1sHSEREd3XwwO2SI4Qr2g5eKmrLyTv5UA-5JNv-6J31dksM-05old-8tDPSSp1q_MDU7_Cbe-O9PmCMzFfKkh-VUmSdjvrtApj_pES6S12s8fUe9Jnrtg6SqDaeHtFJhfaCVmO2A4vkt1fRKI7kL9Z6H3tFDvKdF4TFaKpjSG5gSDrnuD7SXQfEAbWlH50-0XfLPaQi8BKktuQnFYIl0Wiedo9OoCtMQaV5vFpE0yhqZMSdjr2DNksqIRiY0U14AWEiE44mRgMxiYw3XVspMZRlLYOAkLK7JMhnv9rp-hVCjYqOcddooITjXkGaxaQhpmpaJBl8le9B_aWVleRoc6JylmImdmladukr2R-OZPpeiHb-U3RkNeAqWhe4S3fW9fp4mPPidmWqu_fG922Ty6riVXp61L9bJFMdIwcjx4RtkvHjp-02AL4XZCpP1DYXX64A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recurrent+Reinforcement+Learning+Strategy+with+a+Parameterized+Agent+for+Online+Scheduling+of+a+State+Task+Network+Under+Uncertainty&rft.jtitle=Industrial+%26+engineering+chemistry+research&rft.au=Rangel-Martinez%2C+Daniel&rft.au=Ricardez-Sandoval%2C+Luis+A.&rft.date=2025-04-02&rft.issn=1520-5045&rft.volume=64&rft.issue=13&rft_id=info:doi/10.1021%2Facs.iecr.4c04900&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0888-5885&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0888-5885&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0888-5885&client=summon