A safe reinforcement learning algorithm for supervisory control of power plants

Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge-based systems Vol. 301; p. 112312
Main Authors	Sun, Yixuan, Khairy, Sami, Vilim, Richard B., Hu, Rui, Dave, Akshay J.
Format	Journal Article
Language	English
Published	United States Elsevier B.V 09.10.2024 Elsevier BV
Subjects	Constrained optimization Data-driven control Power plants Safe reinforcement learning Safe reinforcement learning Data-driven control Constrained optimization Power plants
Online Access	Get full text
ISSN	0950-7051
DOI	10.1016/j.knosys.2024.112312

Cover

Abstract	Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment’s dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
AbstractList	Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment’s dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. In conclusion, our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design. Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment’s dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
ArticleNumber	112312
Author	Sun, Yixuan Vilim, Richard B. Khairy, Sami Hu, Rui Dave, Akshay J.
Author_xml	– sequence: 1 givenname: Yixuan surname: Sun fullname: Sun, Yixuan organization: Mathematics and Computer Science Division Argonne National Laboratory, United States of America – sequence: 2 givenname: Sami orcidid: 0000-0001-6730-7267 surname: Khairy fullname: Khairy, Sami organization: Microsoft, Canada – sequence: 3 givenname: Richard B. surname: Vilim fullname: Vilim, Richard B. organization: Nuclear Science and Engineering Division Argonne National Laboratory, United States of America – sequence: 4 givenname: Rui surname: Hu fullname: Hu, Rui organization: Nuclear Science and Engineering Division Argonne National Laboratory, United States of America – sequence: 5 givenname: Akshay J. orcidid: 0000-0003-0822-1409 surname: Dave fullname: Dave, Akshay J. email: ajd@anl.gov organization: Nuclear Science and Engineering Division Argonne National Laboratory, United States of America
BackLink	https://www.osti.gov/servlets/purl/2588109$$D View this record in Osti.gov
BookMark	eNqFkDtPwzAUhT0UiRb4BwwWe4IfSewwIFUVL6lSF5gt49y0Lqkd2aao_55EYWKA4eoO95yjc78FmjnvAKFrSnJKaHW7zz-cj6eYM8KKnFLGKZuhOalLkglS0nO0iHFPCGGMyjnaLHHULeAA1rU-GDiAS7gDHZx1W6y7rQ827Q54OOL42UM42ujDCRvvUvAd9i3u_RcE3HfapXiJzlrdRbj62Rfo7fHhdfWcrTdPL6vlOjOc85QVpWTQaMEl43VLCiaMFu-CcSCtlpzVVLOayJJyXjWFkVxUshEUhqFQNZpfoJsp18dkVTQ2gdkNnRyYpFgpJSX1ILqbRCb4GAO0atDpZMfq2naKEjUyU3s1MVMjMzUxG8zFL3Mf7EGH03-2-8kGw_dHC2EsB85AY8PYrfH274BvnriMwA
CitedBy_id	crossref_primary_10_1016_j_engappai_2025_110091
Cites_doi	10.1016/j.pnucene.2022.104401 10.1177/0278364920987859 10.1016/j.anucene.2022.109593 10.1115/1.482794 10.1109/JIOT.2020.2966232 10.1016/j.enpol.2016.03.022 10.1016/j.apenergy.2018.03.002 10.1016/j.pnucene.2021.104107 10.1109/JSEN.2021.3096245 10.1109/TCST.2024.3393210 10.1109/ACCESS.2020.3027152 10.1016/S0005-1098(01)00084-X 10.1177/0278364913495721 10.1109/TITS.2021.3054625 10.1073/pnas.1517384113 10.1016/j.ifacol.2016.10.249 10.1038/nature24270 10.1007/s10462-022-10205-5 10.1016/j.eng.2021.04.020
ContentType	Journal Article
Copyright	2024 Elsevier B.V.
Copyright_xml	– notice: 2024 Elsevier B.V.
CorporateAuthor	Argonne National Laboratory (ANL), Argonne, IL (United States)
CorporateAuthor_xml	– name: Argonne National Laboratory (ANL), Argonne, IL (United States)
DBID	AAYXX CITATION OIOZB OTOTI
DOI	10.1016/j.knosys.2024.112312
DatabaseName	CrossRef OSTI.GOV - Hybrid OSTI.GOV
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
ExternalDocumentID	2588109 10_1016_j_knosys_2024_112312 S0950705124009468
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1~. 1~5 4.4 457 4G. 5VS 7-5 71M 77K 8P~ 9JN AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAXKI AAXUO AAYFN ABAOU ABBOA ABIVO ABJNI ABMAC ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE ADGUI ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ARUGR AXJTR BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EO8 EO9 EP2 EP3 FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W JJJVA KOM MHUIS MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 RIG ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SST SSV SSW SSZ T5K WH7 XPP ZMT ~02 ~G- 29L 77I AAQXK AATTM AAYWO AAYXX ABDPE ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EFLBG EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ LG9 LY7 M41 R2- SBC SET UHS WUQ ~HD OIOZB OTOTI
ID	FETCH-LOGICAL-c333t-4582eda738239f0427ca7b723e0fa83291a290851336d4c83768d71ed711e6da3
IEDL.DBID	.~1
ISSN	0950-7051
IngestDate	Mon Oct 13 02:20:18 EDT 2025 Wed Oct 01 05:54:40 EDT 2025 Thu Apr 24 23:10:03 EDT 2025 Sat Sep 07 15:51:21 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Safe reinforcement learning Data-driven control Constrained optimization Power plants
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c333t-4582eda738239f0427ca7b723e0fa83291a290851336d4c83768d71ed711e6da3
Notes	AC02-06CH11357 USDOE Laboratory Directed Research and Development (LDRD) Program
ORCID	0000-0003-0822-1409 0000-0001-6730-7267 0000000167307267 0000000308221409 0000000237712920
OpenAccessLink	https://www.osti.gov/servlets/purl/2588109
ParticipantIDs	osti_scitechconnect_2588109 crossref_citationtrail_10_1016_j_knosys_2024_112312 crossref_primary_10_1016_j_knosys_2024_112312 elsevier_sciencedirect_doi_10_1016_j_knosys_2024_112312
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-10-09
PublicationDateYYYYMMDD	2024-10-09
PublicationDate_xml	– month: 10 year: 2024 text: 2024-10-09 day: 09
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	Knowledge-based systems
PublicationYear	2024
Publisher	Elsevier B.V Elsevier BV
Publisher_xml	– name: Elsevier B.V – name: Elsevier BV
References	Uc-Cetina, Navarro-Guerrero, Martin-Gonzalez, Weber, Wermter (b18) 2023; 56 Administration (b38) 2022 Zhang, Guo (b33) 2022 Patek (b24) 2001; 37 Kiran, Sobh, Talpaert, Mannion, Al Sallab, Yogamani, Pérez (b19) 2021; 23 Huang, Ji, Zhang, Xia, Yang (b34) 2023 Qian, Liu (b8) 2022; 152 Dalal, Dvijotham, Vecerik, Hester, Paduraru, Tassa (b35) 2018 Brunke, Greeff, Hall, Yuan, Zhou, Panerati, Schoellig (b11) 2021 Schulman, Wolski, Dhariwal, Radford, Klimov (b45) 2017 Sutton, Barto (b13) 2018 Hu (b41) 2017 Sun, You (b6) 2021; 7 De Lellis, Coraggio, Russo, Musolesi, di Bernardo (b31) 2024 Yadav, Zhang, Elgendy, Dong, Shafiq, Laghari, Prakash (b21) 2021; 21 Korupolu, Ravindran (b26) 2011 Yao, Liu, Cen, Zhu, Yu, Zhang, Zhao (b30) 2024 Kingma, Ba (b49) 2017 Laud (b52) 2004 Heger (b23) 1994 Schulman, Moritz, Levine, Jordan, Abbeel (b50) 2015 McCombie, Jefferson (b37) 2016; 96 Roman, Precup, Petriu, Borlea (b3) 2024; 27 Dave, Lee, Ponciroli, Vilim (b43) 2023; 182 Achiam, Held, Tamar, Abbeel (b29) 2017 Pham, Magistris, Tachibana (b36) 2018 Mnih, Kavukcuoglu, Silver, Graves, Antonoglou, Wierstra, Riedmiller (b16) 2013 Khan, Khan, Tooshil, Sikder, Mahmud, Kouzani, Nahid (b20) 2020; 8 Altman (b28) 1999 Haarnoja, Zhou, Abbeel, Levine (b51) 2018 Liang, Que, Modiano (b47) 2018 Zamfirache, Precup, Petriu (b10) 2023; 21 C. Gehring, D. Precup, Smart exploration in reinforcement learning using absolute temporal difference errors, in: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems, 2013, pp. 1037–1044. Viswanathan, Stringer (b40) 2000; 122 Karanayil, Rahman (b1) 2018 Park, Kim, Seong, Koo (b5) 2022; 145 J. Garcıa, F. Fernandez, A Comprehensive Survey on Safe Reinforcement Learning. Brunton, Proctor, Kutz (b12) 2016; 113 Brunton, Proctor, Kutz (b44) 2016; 49 Silver, Schrittwieser, Simonyan, Antonoglou, Huang, Guez, Hubert, Baker, Lai, Bolton (b17) 2017; 550 Precup, Roman, Safaei (b4) 2021 Lin, Guan, Peng, Wang, Maharjan, Ohtsuki (b9) 2020; 7 Wu, Zhang, Haesaert, Ma, Sun (b32) 2023 Kober, Bagnell, Peters (b14) 2013; 32 Tessler, Mankowitz, Mannor (b48) 2018 Jenkins, Zhou, Ponciroli, Vilim, Ganda, de Sisternes, Botterud (b39) 2018; 222 Fu (b7) 2018 Ibarz, Tan, Finn, Kalakrishnan, Pastor, Levine (b15) 2021; 40 A. Tamar, D. Di Castro, S. Mannor, Policy gradients with variance related risk criteria, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, pp. 387–396. Hu, O’Grady, Zou, Hu (b42) 2020 Zhang, Beetner, Wunsch, Hemmelman, Hasan (b2) 2005 Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin, Zhang, Agarwal, Slama, Ray (b46) 2022; 35 Pham (10.1016/j.knosys.2024.112312_b36) 2018 Haarnoja (10.1016/j.knosys.2024.112312_b51) 2018 Karanayil (10.1016/j.knosys.2024.112312_b1) 2018 Fu (10.1016/j.knosys.2024.112312_b7) 2018 Achiam (10.1016/j.knosys.2024.112312_b29) 2017 Kober (10.1016/j.knosys.2024.112312_b14) 2013; 32 Jenkins (10.1016/j.knosys.2024.112312_b39) 2018; 222 Hu (10.1016/j.knosys.2024.112312_b41) 2017 Uc-Cetina (10.1016/j.knosys.2024.112312_b18) 2023; 56 Dalal (10.1016/j.knosys.2024.112312_b35) 2018 Zamfirache (10.1016/j.knosys.2024.112312_b10) 2023; 21 Brunton (10.1016/j.knosys.2024.112312_b12) 2016; 113 Kiran (10.1016/j.knosys.2024.112312_b19) 2021; 23 Korupolu (10.1016/j.knosys.2024.112312_b26) 2011 Schulman (10.1016/j.knosys.2024.112312_b45) 2017 Yao (10.1016/j.knosys.2024.112312_b30) 2024 Heger (10.1016/j.knosys.2024.112312_b23) 1994 Dave (10.1016/j.knosys.2024.112312_b43) 2023; 182 Mnih (10.1016/j.knosys.2024.112312_b16) 2013 Khan (10.1016/j.knosys.2024.112312_b20) 2020; 8 Kingma (10.1016/j.knosys.2024.112312_b49) 2017 Tessler (10.1016/j.knosys.2024.112312_b48) 2018 Patek (10.1016/j.knosys.2024.112312_b24) 2001; 37 Sun (10.1016/j.knosys.2024.112312_b6) 2021; 7 Schulman (10.1016/j.knosys.2024.112312_b50) 2015 Qian (10.1016/j.knosys.2024.112312_b8) 2022; 152 Silver (10.1016/j.knosys.2024.112312_b17) 2017; 550 De Lellis (10.1016/j.knosys.2024.112312_b31) 2024 Zhang (10.1016/j.knosys.2024.112312_b33) 2022 Precup (10.1016/j.knosys.2024.112312_b4) 2021 Altman (10.1016/j.knosys.2024.112312_b28) 1999 Liang (10.1016/j.knosys.2024.112312_b47) 2018 Huang (10.1016/j.knosys.2024.112312_b34) 2023 Laud (10.1016/j.knosys.2024.112312_b52) 2004 Ibarz (10.1016/j.knosys.2024.112312_b15) 2021; 40 10.1016/j.knosys.2024.112312_b27 10.1016/j.knosys.2024.112312_b25 Yadav (10.1016/j.knosys.2024.112312_b21) 2021; 21 Viswanathan (10.1016/j.knosys.2024.112312_b40) 2000; 122 Roman (10.1016/j.knosys.2024.112312_b3) 2024; 27 Brunke (10.1016/j.knosys.2024.112312_b11) 2021 Ouyang (10.1016/j.knosys.2024.112312_b46) 2022; 35 10.1016/j.knosys.2024.112312_b22 Park (10.1016/j.knosys.2024.112312_b5) 2022; 145 Lin (10.1016/j.knosys.2024.112312_b9) 2020; 7 McCombie (10.1016/j.knosys.2024.112312_b37) 2016; 96 Hu (10.1016/j.knosys.2024.112312_b42) 2020 Wu (10.1016/j.knosys.2024.112312_b32) 2023 Administration (10.1016/j.knosys.2024.112312_b38) 2022 Brunton (10.1016/j.knosys.2024.112312_b44) 2016; 49 Sutton (10.1016/j.knosys.2024.112312_b13) 2018 Zhang (10.1016/j.knosys.2024.112312_b2) 2005
References_xml	– volume: 8 start-page: 176598 year: 2020 end-page: 176623 ident: b20 article-title: A systematic review on reinforcement learning-based robotics within the last decade publication-title: IEEE Access – reference: A. Tamar, D. Di Castro, S. Mannor, Policy gradients with variance related risk criteria, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, pp. 387–396. – volume: 122 start-page: 246 year: 2000 end-page: 255 ident: b40 article-title: Failure mechanisms of high temperature components in power plants publication-title: J. Eng. Mater. Technol. – year: 2017 ident: b41 article-title: SAM Theory Manual – volume: 49 start-page: 710 year: 2016 end-page: 715 ident: b44 article-title: Sparse identification of nonlinear dynamics with control (SINDYc) publication-title: IFAC-PapersOnLine – year: 2021 ident: b4 article-title: Data-Driven Model-Free Controllers – start-page: 105 year: 1994 end-page: 111 ident: b23 article-title: Consideration of risk in reinforcement learning publication-title: Machine Learning Proceedings 1994 – volume: 222 start-page: 872 year: 2018 end-page: 884 ident: b39 article-title: The benefits of nuclear flexibility in power system operations with renewable energy publication-title: Appl. Energy – year: 2018 ident: b7 article-title: Risk-sensitive reinforcement learning: A constrained optimization viewpoint – volume: 145 year: 2022 ident: b5 article-title: Control automation in the heat-up mode of a nuclear power plant using reinforcement learning publication-title: Prog. Nucl. Energy – year: 2024 ident: b31 article-title: Guaranteeing control requirements via reward shaping in reinforcement learning publication-title: IEEE Trans. Control Syst. Technol. – year: 2018 ident: b36 article-title: OptLayer - Practical constrained optimization for deep reinforcement learning in the real world – year: 2004 ident: b52 article-title: Theory and Application of Reward Shaping in Reinforcement Learning – reference: C. Gehring, D. Precup, Smart exploration in reinforcement learning using absolute temporal difference errors, in: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems, 2013, pp. 1037–1044. – year: 2018 ident: b48 article-title: Reward constrained policy optimization – volume: 40 start-page: 698 year: 2021 end-page: 721 ident: b15 article-title: How to train your robot with deep reinforcement learning: lessons we have learned publication-title: Int. J. Robot. Res. – year: 2018 ident: b13 article-title: Reinforcement Learning: An Introduction – reference: J. Garcıa, F. Fernandez, A Comprehensive Survey on Safe Reinforcement Learning. – volume: 96 start-page: 758 year: 2016 end-page: 769 ident: b37 article-title: Renewable and nuclear electricity: Comparison of environmental impacts publication-title: Energy Policy – volume: 32 start-page: 1238 year: 2013 end-page: 1274 ident: b14 article-title: Reinforcement learning in robotics: A survey publication-title: Int. J. Robot. Res. – volume: 7 start-page: 1239 year: 2021 end-page: 1247 ident: b6 article-title: Machine learning and data-driven techniques for the control of smart power generation systems: An uncertainty handling perspective publication-title: Engineering – year: 2022 ident: b38 article-title: Cost and performance characteristics of new generating technologies, annual energy outlook 2022 – year: 2020 ident: b42 article-title: Development of a Reference Model for Molten-Salt-Cooled Pebble-Bed Reactor Using SAM – year: 2023 ident: b34 article-title: Safe dreamerv3: Safe reinforcement learning with world models – volume: 550 start-page: 354 year: 2017 end-page: 359 ident: b17 article-title: Mastering the game of go without human knowledge publication-title: Nature – start-page: 319 year: 2005 end-page: 324 ident: b2 article-title: An embedded real-time neuro-fuzzy controller for mobile robot navigation publication-title: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ ’05. – year: 1999 ident: b28 publication-title: Constrained Markov Decision Processes – volume: 35 start-page: 27730 year: 2022 end-page: 27744 ident: b46 article-title: Training language models to follow instructions with human feedback publication-title: Adv. Neural Inf. Process. Syst. – year: 2017 ident: b49 article-title: Adam: A method for stochastic optimization – year: 2013 ident: b16 article-title: Playing atari with deep reinforcement learning – year: 2018 ident: b51 article-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor – volume: 152 year: 2022 ident: b8 article-title: Development of deep reinforcement learning-based fault diagnosis method for rotating machinery in nuclear power plants publication-title: Prog. Nucl. Energy – year: 2018 ident: b35 article-title: Safe exploration in continuous action spaces – volume: 27 start-page: 3 year: 2024 end-page: 17 ident: b3 article-title: Hybrid data-driven active disturbance rejection sliding mode control with tower crane systems validation publication-title: Sci. Technol. – year: 2017 ident: b29 article-title: Constrained policy optimization – volume: 21 start-page: 24910 year: 2021 end-page: 24918 ident: b21 article-title: Smart healthcare: RL-based task offloading scheme for edge-enable sensor networks publication-title: IEEE Sens. J. – volume: 23 start-page: 4909 year: 2021 end-page: 4926 ident: b19 article-title: Deep reinforcement learning for autonomous driving: A survey publication-title: IEEE Trans. Intell. Transp. Syst. – volume: 37 start-page: 1379 year: 2001 end-page: 1386 ident: b24 article-title: On terminating Markov decision processes with a risk-averse objective function publication-title: Automatica – volume: 7 start-page: 6288 year: 2020 end-page: 6301 ident: b9 article-title: Deep reinforcement learning for economic dispatch of virtual power plant in internet of energy publication-title: IEEE Internet Things J. – year: 2017 ident: b45 article-title: Proximal policy optimization algorithms – year: 2018 ident: b47 article-title: Accelerated primal-dual policy optimization for safe reinforcement learning – volume: 113 start-page: 3932 year: 2016 end-page: 3937 ident: b12 article-title: Discovering governing equations from data by sparse identification of nonlinear dynamical systems publication-title: Proc. Natl. Acad. Sci. – volume: 56 start-page: 1543 year: 2023 end-page: 1575 ident: b18 article-title: Survey on reinforcement learning for language processing publication-title: Artif. Intell. Rev. – year: 2022 ident: b33 article-title: Safe reinforcement learning with contrastive risk prediction – start-page: 1 year: 2023 end-page: 6 ident: b32 article-title: Risk-aware reward shaping of reinforcement learning agents for autonomous driving publication-title: IECON 2023-49th Annual Conference of the IEEE Industrial Electronics Society – year: 2024 ident: b30 article-title: Constraint-conditioned policy optimization for versatile safe reinforcement learning – start-page: 1245 year: 2018 end-page: 1260 ident: b1 article-title: 37 - Artificial neural network applications in power electronics and electric drives publication-title: Power Electronics Handbook (Fourth Edition) – year: 2021 ident: b11 article-title: Safe learning in robotics: From learning-based control to safe reinforcement learning – volume: 21 start-page: 615 year: 2023 end-page: 630 ident: b10 article-title: Q-Learning, policy iteration and actor-critic reinforcement learning combined with metaheuristic algorithms in servo system control publication-title: Facta Univ. Ser.: Mech. Eng. – volume: 182 year: 2023 ident: b43 article-title: Design of a supervisory control system for autonomous operation of advanced reactors publication-title: Ann. Nucl. Energy – year: 2011 ident: b26 article-title: Beyond rewards : Learning from richer supervision – year: 2015 ident: b50 article-title: High-dimensional continuous control using generalized advantage estimation – year: 2018 ident: 10.1016/j.knosys.2024.112312_b13 – volume: 152 year: 2022 ident: 10.1016/j.knosys.2024.112312_b8 article-title: Development of deep reinforcement learning-based fault diagnosis method for rotating machinery in nuclear power plants publication-title: Prog. Nucl. Energy doi: 10.1016/j.pnucene.2022.104401 – year: 2018 ident: 10.1016/j.knosys.2024.112312_b36 – volume: 40 start-page: 698 issue: 4–5 year: 2021 ident: 10.1016/j.knosys.2024.112312_b15 article-title: How to train your robot with deep reinforcement learning: lessons we have learned publication-title: Int. J. Robot. Res. doi: 10.1177/0278364920987859 – year: 2011 ident: 10.1016/j.knosys.2024.112312_b26 – year: 2022 ident: 10.1016/j.knosys.2024.112312_b33 – year: 2017 ident: 10.1016/j.knosys.2024.112312_b49 – volume: 182 year: 2023 ident: 10.1016/j.knosys.2024.112312_b43 article-title: Design of a supervisory control system for autonomous operation of advanced reactors publication-title: Ann. Nucl. Energy doi: 10.1016/j.anucene.2022.109593 – year: 1999 ident: 10.1016/j.knosys.2024.112312_b28 – year: 2015 ident: 10.1016/j.knosys.2024.112312_b50 – volume: 122 start-page: 246 issue: 3 year: 2000 ident: 10.1016/j.knosys.2024.112312_b40 article-title: Failure mechanisms of high temperature components in power plants publication-title: J. Eng. Mater. Technol. doi: 10.1115/1.482794 – volume: 35 start-page: 27730 year: 2022 ident: 10.1016/j.knosys.2024.112312_b46 article-title: Training language models to follow instructions with human feedback publication-title: Adv. Neural Inf. Process. Syst. – volume: 7 start-page: 6288 issue: 7 year: 2020 ident: 10.1016/j.knosys.2024.112312_b9 article-title: Deep reinforcement learning for economic dispatch of virtual power plant in internet of energy publication-title: IEEE Internet Things J. doi: 10.1109/JIOT.2020.2966232 – year: 2022 ident: 10.1016/j.knosys.2024.112312_b38 – year: 2020 ident: 10.1016/j.knosys.2024.112312_b42 – ident: 10.1016/j.knosys.2024.112312_b27 – volume: 96 start-page: 758 year: 2016 ident: 10.1016/j.knosys.2024.112312_b37 article-title: Renewable and nuclear electricity: Comparison of environmental impacts publication-title: Energy Policy doi: 10.1016/j.enpol.2016.03.022 – volume: 222 start-page: 872 year: 2018 ident: 10.1016/j.knosys.2024.112312_b39 article-title: The benefits of nuclear flexibility in power system operations with renewable energy publication-title: Appl. Energy doi: 10.1016/j.apenergy.2018.03.002 – volume: 27 start-page: 3 year: 2024 ident: 10.1016/j.knosys.2024.112312_b3 article-title: Hybrid data-driven active disturbance rejection sliding mode control with tower crane systems validation publication-title: Sci. Technol. – volume: 21 start-page: 615 issue: 4 year: 2023 ident: 10.1016/j.knosys.2024.112312_b10 article-title: Q-Learning, policy iteration and actor-critic reinforcement learning combined with metaheuristic algorithms in servo system control publication-title: Facta Univ. Ser.: Mech. Eng. – year: 2017 ident: 10.1016/j.knosys.2024.112312_b45 – year: 2018 ident: 10.1016/j.knosys.2024.112312_b47 – ident: 10.1016/j.knosys.2024.112312_b22 – start-page: 1245 year: 2018 ident: 10.1016/j.knosys.2024.112312_b1 article-title: 37 - Artificial neural network applications in power electronics and electric drives – volume: 145 year: 2022 ident: 10.1016/j.knosys.2024.112312_b5 article-title: Control automation in the heat-up mode of a nuclear power plant using reinforcement learning publication-title: Prog. Nucl. Energy doi: 10.1016/j.pnucene.2021.104107 – volume: 21 start-page: 24910 issue: 22 year: 2021 ident: 10.1016/j.knosys.2024.112312_b21 article-title: Smart healthcare: RL-based task offloading scheme for edge-enable sensor networks publication-title: IEEE Sens. J. doi: 10.1109/JSEN.2021.3096245 – year: 2018 ident: 10.1016/j.knosys.2024.112312_b51 – year: 2004 ident: 10.1016/j.knosys.2024.112312_b52 – year: 2024 ident: 10.1016/j.knosys.2024.112312_b31 article-title: Guaranteeing control requirements via reward shaping in reinforcement learning publication-title: IEEE Trans. Control Syst. Technol. doi: 10.1109/TCST.2024.3393210 – year: 2021 ident: 10.1016/j.knosys.2024.112312_b4 – year: 2024 ident: 10.1016/j.knosys.2024.112312_b30 – year: 2018 ident: 10.1016/j.knosys.2024.112312_b48 – volume: 8 start-page: 176598 year: 2020 ident: 10.1016/j.knosys.2024.112312_b20 article-title: A systematic review on reinforcement learning-based robotics within the last decade publication-title: IEEE Access doi: 10.1109/ACCESS.2020.3027152 – start-page: 1 year: 2023 ident: 10.1016/j.knosys.2024.112312_b32 article-title: Risk-aware reward shaping of reinforcement learning agents for autonomous driving – volume: 37 start-page: 1379 issue: 9 year: 2001 ident: 10.1016/j.knosys.2024.112312_b24 article-title: On terminating Markov decision processes with a risk-averse objective function publication-title: Automatica doi: 10.1016/S0005-1098(01)00084-X – year: 2017 ident: 10.1016/j.knosys.2024.112312_b29 – year: 2021 ident: 10.1016/j.knosys.2024.112312_b11 – year: 2018 ident: 10.1016/j.knosys.2024.112312_b7 – volume: 32 start-page: 1238 issue: 11 year: 2013 ident: 10.1016/j.knosys.2024.112312_b14 article-title: Reinforcement learning in robotics: A survey publication-title: Int. J. Robot. Res. doi: 10.1177/0278364913495721 – volume: 23 start-page: 4909 issue: 6 year: 2021 ident: 10.1016/j.knosys.2024.112312_b19 article-title: Deep reinforcement learning for autonomous driving: A survey publication-title: IEEE Trans. Intell. Transp. Syst. doi: 10.1109/TITS.2021.3054625 – start-page: 105 year: 1994 ident: 10.1016/j.knosys.2024.112312_b23 article-title: Consideration of risk in reinforcement learning – year: 2013 ident: 10.1016/j.knosys.2024.112312_b16 – volume: 113 start-page: 3932 issue: 15 year: 2016 ident: 10.1016/j.knosys.2024.112312_b12 article-title: Discovering governing equations from data by sparse identification of nonlinear dynamical systems publication-title: Proc. Natl. Acad. Sci. doi: 10.1073/pnas.1517384113 – start-page: 319 year: 2005 ident: 10.1016/j.knosys.2024.112312_b2 article-title: An embedded real-time neuro-fuzzy controller for mobile robot navigation – volume: 49 start-page: 710 issue: 18 year: 2016 ident: 10.1016/j.knosys.2024.112312_b44 article-title: Sparse identification of nonlinear dynamics with control (SINDYc) publication-title: IFAC-PapersOnLine doi: 10.1016/j.ifacol.2016.10.249 – year: 2023 ident: 10.1016/j.knosys.2024.112312_b34 – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 10.1016/j.knosys.2024.112312_b17 article-title: Mastering the game of go without human knowledge publication-title: Nature doi: 10.1038/nature24270 – ident: 10.1016/j.knosys.2024.112312_b25 – year: 2017 ident: 10.1016/j.knosys.2024.112312_b41 – volume: 56 start-page: 1543 issue: 2 year: 2023 ident: 10.1016/j.knosys.2024.112312_b18 article-title: Survey on reinforcement learning for language processing publication-title: Artif. Intell. Rev. doi: 10.1007/s10462-022-10205-5 – year: 2018 ident: 10.1016/j.knosys.2024.112312_b35 – volume: 7 start-page: 1239 issue: 9 year: 2021 ident: 10.1016/j.knosys.2024.112312_b6 article-title: Machine learning and data-driven techniques for the control of smart power generation systems: An uncertainty handling perspective publication-title: Engineering doi: 10.1016/j.eng.2021.04.020
SSID	ssj0002218
Score	2.4309916
Snippet	Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to...
SourceID	osti crossref elsevier
SourceType	Open Access Repository Enrichment Source Index Database Publisher
StartPage	112312
SubjectTerms	Constrained optimization Data-driven control Power plants Safe reinforcement learning
Title	A safe reinforcement learning algorithm for supervisory control of power plants
URI	https://dx.doi.org/10.1016/j.knosys.2024.112312 https://www.osti.gov/servlets/purl/2588109
Volume	301
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) issn: 0950-7051 databaseCode: GBLVA dateStart: 20110101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0002218 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier ScienceDirect issn: 0950-7051 databaseCode: .~1 dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0002218 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] issn: 0950-7051 databaseCode: ACRLP dateStart: 19950201 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0002218 providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection issn: 0950-7051 databaseCode: AIKHN dateStart: 19950201 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0002218 providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals issn: 0950-7051 databaseCode: AKRWK dateStart: 19871201 customDbUrl: isFulltext: true mediaType: online dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002218 providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqsrDwRpRC5YHVNI6d11hVVAVEGaBSt8hN7FIoSZSkQxd-O77E4TGgSgwZkvii6Ow7n-3vvkPoSs1dKqmQJFJMER5bighOFbFE7AZAKOZUDHwPE3c85XczZ9ZCwyYXBmCVxvfXPr3y1uZJ32izny2X_ScdHOjxqicsDvA4FxJ-OfegisH1xzfMw7arPT5oTKB1kz5XYbzekrTYAGm3zSGXhlH7r-mpnWqL-zHzjA7QngkZ8aD-q0PUkskR2m_KMWBjncfocYALoSTOZcWGGlUbf9iUhVhgsVqk-bJ8ecf6JS7WGXiJIs032MDVcapwBkXTcLYCdMwJmo5unodjYuolkIgxVhI4ApOx8OBoL1BQRCMS3tyzmbSU0JYbUGEHEGIx5sY80ktT1489KvVFpRsLdoraSZrIM4SlzyIquANUO1xoG_W4XqipWEcbvg44RAexRk1hZMjEoabFKmxQY69hrdwQlBvWyu0g8iWV1WQaW9p7TQ-EvwZFqP39FskudBhIARduBKAhLWY7vk-t4Pzf3-2iXbir0HzBBWqX-Vpe6qiknPeqYddDO4Pb-_HkE0qF4NU
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZKGWDhjSjl4YHVtI6d11hVVAXaMtBK3SI3sUuhNFGaDl347fgSh8eAKjFkiX1RdPadz_Z33yF0oyYOlVRIEiqmCI-aighOFWmKyPGBUMzOGfj6A6c74g9je1xB7TIXBmCVxvcXPj331uZNw2izkcxmjWcdHOj5qhcsDvA4x9tC29y2XNiB3X584zwsKz_kg94Eupf5cznI620RL9fA2m1xSKZh1PprfarG2uR-LD2dA7RnYkbcKn7rEFXk4gjtl_UYsDHPY_TUwkuhJE5lToca5id_2NSFmGIxn8bpLHt5x7oRL1cJuIllnK6xwavjWOEEqqbhZA7wmBM06twN211iCiaQkDGWEbgDk5Fw4W7PV1BFIxTuxLWYbCqhTdenwvIhxmLMiXio96aOF7lU6odKJxLsFFUX8UKeISw9FlLBbeDa4UIbqcv1Tk1FOtzwdMQhaoiVagpCwyYORS3mQQkbew0K5Qag3KBQbg2RL6mkYNPY0N8tRyD4NSsC7fA3SNZhwEAKyHBDQA1pMcv2PNr0z__93Wu00x32e0HvfvBYR7vQkkP7_AtUzdKVvNQhSja5yqfgJ5hk4mo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+safe+reinforcement+learning+algorithm+for+supervisory+control+of+power+plants&rft.jtitle=Knowledge-based+systems&rft.au=Sun%2C+Yixuan&rft.au=Khairy%2C+Sami&rft.au=Vilim%2C+Richard+B.&rft.au=Hu%2C+Rui&rft.date=2024-10-09&rft.pub=Elsevier+B.V&rft.issn=0950-7051&rft.volume=301&rft_id=info:doi/10.1016%2Fj.knosys.2024.112312&rft.externalDocID=S0950705124009468
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-7051&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-7051&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-7051&client=summon