A safe reinforcement learning algorithm for supervisory control of power plants

Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL...

Full description

Saved in:
Bibliographic Details
Published inKnowledge-based systems Vol. 301; p. 112312
Main Authors Sun, Yixuan, Khairy, Sami, Vilim, Richard B., Hu, Rui, Dave, Akshay J.
Format Journal Article
LanguageEnglish
Published United States Elsevier B.V 09.10.2024
Elsevier BV
Subjects
Online AccessGet full text
ISSN0950-7051
DOI10.1016/j.knosys.2024.112312

Cover

Abstract Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment’s dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
AbstractList Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment’s dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. In conclusion, our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment’s dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
ArticleNumber 112312
Author Sun, Yixuan
Vilim, Richard B.
Khairy, Sami
Hu, Rui
Dave, Akshay J.
Author_xml – sequence: 1
  givenname: Yixuan
  surname: Sun
  fullname: Sun, Yixuan
  organization: Mathematics and Computer Science Division Argonne National Laboratory, United States of America
– sequence: 2
  givenname: Sami
  orcidid: 0000-0001-6730-7267
  surname: Khairy
  fullname: Khairy, Sami
  organization: Microsoft, Canada
– sequence: 3
  givenname: Richard B.
  surname: Vilim
  fullname: Vilim, Richard B.
  organization: Nuclear Science and Engineering Division Argonne National Laboratory, United States of America
– sequence: 4
  givenname: Rui
  surname: Hu
  fullname: Hu, Rui
  organization: Nuclear Science and Engineering Division Argonne National Laboratory, United States of America
– sequence: 5
  givenname: Akshay J.
  orcidid: 0000-0003-0822-1409
  surname: Dave
  fullname: Dave, Akshay J.
  email: ajd@anl.gov
  organization: Nuclear Science and Engineering Division Argonne National Laboratory, United States of America
BackLink https://www.osti.gov/servlets/purl/2588109$$D View this record in Osti.gov
BookMark eNqFkDtPwzAUhT0UiRb4BwwWe4IfSewwIFUVL6lSF5gt49y0Lqkd2aao_55EYWKA4eoO95yjc78FmjnvAKFrSnJKaHW7zz-cj6eYM8KKnFLGKZuhOalLkglS0nO0iHFPCGGMyjnaLHHULeAA1rU-GDiAS7gDHZx1W6y7rQ827Q54OOL42UM42ujDCRvvUvAd9i3u_RcE3HfapXiJzlrdRbj62Rfo7fHhdfWcrTdPL6vlOjOc85QVpWTQaMEl43VLCiaMFu-CcSCtlpzVVLOayJJyXjWFkVxUshEUhqFQNZpfoJsp18dkVTQ2gdkNnRyYpFgpJSX1ILqbRCb4GAO0atDpZMfq2naKEjUyU3s1MVMjMzUxG8zFL3Mf7EGH03-2-8kGw_dHC2EsB85AY8PYrfH274BvnriMwA
CitedBy_id crossref_primary_10_1016_j_engappai_2025_110091
Cites_doi 10.1016/j.pnucene.2022.104401
10.1177/0278364920987859
10.1016/j.anucene.2022.109593
10.1115/1.482794
10.1109/JIOT.2020.2966232
10.1016/j.enpol.2016.03.022
10.1016/j.apenergy.2018.03.002
10.1016/j.pnucene.2021.104107
10.1109/JSEN.2021.3096245
10.1109/TCST.2024.3393210
10.1109/ACCESS.2020.3027152
10.1016/S0005-1098(01)00084-X
10.1177/0278364913495721
10.1109/TITS.2021.3054625
10.1073/pnas.1517384113
10.1016/j.ifacol.2016.10.249
10.1038/nature24270
10.1007/s10462-022-10205-5
10.1016/j.eng.2021.04.020
ContentType Journal Article
Copyright 2024 Elsevier B.V.
Copyright_xml – notice: 2024 Elsevier B.V.
CorporateAuthor Argonne National Laboratory (ANL), Argonne, IL (United States)
CorporateAuthor_xml – name: Argonne National Laboratory (ANL), Argonne, IL (United States)
DBID AAYXX
CITATION
OIOZB
OTOTI
DOI 10.1016/j.knosys.2024.112312
DatabaseName CrossRef
OSTI.GOV - Hybrid
OSTI.GOV
DatabaseTitle CrossRef
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 2588109
10_1016_j_knosys_2024_112312
S0950705124009468
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
4.4
457
4G.
5VS
7-5
71M
77K
8P~
9JN
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXKI
AAXUO
AAYFN
ABAOU
ABBOA
ABIVO
ABJNI
ABMAC
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EO8
EO9
EP2
EP3
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
JJJVA
KOM
MHUIS
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSW
SSZ
T5K
WH7
XPP
ZMT
~02
~G-
29L
77I
AAQXK
AATTM
AAYWO
AAYXX
ABDPE
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EFLBG
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
LG9
LY7
M41
R2-
SBC
SET
UHS
WUQ
~HD
OIOZB
OTOTI
ID FETCH-LOGICAL-c333t-4582eda738239f0427ca7b723e0fa83291a290851336d4c83768d71ed711e6da3
IEDL.DBID .~1
ISSN 0950-7051
IngestDate Mon Oct 13 02:20:18 EDT 2025
Wed Oct 01 05:54:40 EDT 2025
Thu Apr 24 23:10:03 EDT 2025
Sat Sep 07 15:51:21 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Safe reinforcement learning
Data-driven control
Constrained optimization
Power plants
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c333t-4582eda738239f0427ca7b723e0fa83291a290851336d4c83768d71ed711e6da3
Notes AC02-06CH11357
USDOE Laboratory Directed Research and Development (LDRD) Program
ORCID 0000-0003-0822-1409
0000-0001-6730-7267
0000000167307267
0000000308221409
0000000237712920
OpenAccessLink https://www.osti.gov/servlets/purl/2588109
ParticipantIDs osti_scitechconnect_2588109
crossref_citationtrail_10_1016_j_knosys_2024_112312
crossref_primary_10_1016_j_knosys_2024_112312
elsevier_sciencedirect_doi_10_1016_j_knosys_2024_112312
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-10-09
PublicationDateYYYYMMDD 2024-10-09
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-10-09
  day: 09
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Knowledge-based systems
PublicationYear 2024
Publisher Elsevier B.V
Elsevier BV
Publisher_xml – name: Elsevier B.V
– name: Elsevier BV
References Uc-Cetina, Navarro-Guerrero, Martin-Gonzalez, Weber, Wermter (b18) 2023; 56
Administration (b38) 2022
Zhang, Guo (b33) 2022
Patek (b24) 2001; 37
Kiran, Sobh, Talpaert, Mannion, Al Sallab, Yogamani, Pérez (b19) 2021; 23
Huang, Ji, Zhang, Xia, Yang (b34) 2023
Qian, Liu (b8) 2022; 152
Dalal, Dvijotham, Vecerik, Hester, Paduraru, Tassa (b35) 2018
Brunke, Greeff, Hall, Yuan, Zhou, Panerati, Schoellig (b11) 2021
Schulman, Wolski, Dhariwal, Radford, Klimov (b45) 2017
Sutton, Barto (b13) 2018
Hu (b41) 2017
Sun, You (b6) 2021; 7
De Lellis, Coraggio, Russo, Musolesi, di Bernardo (b31) 2024
Yadav, Zhang, Elgendy, Dong, Shafiq, Laghari, Prakash (b21) 2021; 21
Korupolu, Ravindran (b26) 2011
Yao, Liu, Cen, Zhu, Yu, Zhang, Zhao (b30) 2024
Kingma, Ba (b49) 2017
Laud (b52) 2004
Heger (b23) 1994
Schulman, Moritz, Levine, Jordan, Abbeel (b50) 2015
McCombie, Jefferson (b37) 2016; 96
Roman, Precup, Petriu, Borlea (b3) 2024; 27
Dave, Lee, Ponciroli, Vilim (b43) 2023; 182
Achiam, Held, Tamar, Abbeel (b29) 2017
Pham, Magistris, Tachibana (b36) 2018
Mnih, Kavukcuoglu, Silver, Graves, Antonoglou, Wierstra, Riedmiller (b16) 2013
Khan, Khan, Tooshil, Sikder, Mahmud, Kouzani, Nahid (b20) 2020; 8
Altman (b28) 1999
Haarnoja, Zhou, Abbeel, Levine (b51) 2018
Liang, Que, Modiano (b47) 2018
Zamfirache, Precup, Petriu (b10) 2023; 21
C. Gehring, D. Precup, Smart exploration in reinforcement learning using absolute temporal difference errors, in: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems, 2013, pp. 1037–1044.
Viswanathan, Stringer (b40) 2000; 122
Karanayil, Rahman (b1) 2018
Park, Kim, Seong, Koo (b5) 2022; 145
J. Garcıa, F. Fernandez, A Comprehensive Survey on Safe Reinforcement Learning.
Brunton, Proctor, Kutz (b12) 2016; 113
Brunton, Proctor, Kutz (b44) 2016; 49
Silver, Schrittwieser, Simonyan, Antonoglou, Huang, Guez, Hubert, Baker, Lai, Bolton (b17) 2017; 550
Precup, Roman, Safaei (b4) 2021
Lin, Guan, Peng, Wang, Maharjan, Ohtsuki (b9) 2020; 7
Wu, Zhang, Haesaert, Ma, Sun (b32) 2023
Kober, Bagnell, Peters (b14) 2013; 32
Tessler, Mankowitz, Mannor (b48) 2018
Jenkins, Zhou, Ponciroli, Vilim, Ganda, de Sisternes, Botterud (b39) 2018; 222
Fu (b7) 2018
Ibarz, Tan, Finn, Kalakrishnan, Pastor, Levine (b15) 2021; 40
A. Tamar, D. Di Castro, S. Mannor, Policy gradients with variance related risk criteria, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, pp. 387–396.
Hu, O’Grady, Zou, Hu (b42) 2020
Zhang, Beetner, Wunsch, Hemmelman, Hasan (b2) 2005
Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin, Zhang, Agarwal, Slama, Ray (b46) 2022; 35
Pham (10.1016/j.knosys.2024.112312_b36) 2018
Haarnoja (10.1016/j.knosys.2024.112312_b51) 2018
Karanayil (10.1016/j.knosys.2024.112312_b1) 2018
Fu (10.1016/j.knosys.2024.112312_b7) 2018
Achiam (10.1016/j.knosys.2024.112312_b29) 2017
Kober (10.1016/j.knosys.2024.112312_b14) 2013; 32
Jenkins (10.1016/j.knosys.2024.112312_b39) 2018; 222
Hu (10.1016/j.knosys.2024.112312_b41) 2017
Uc-Cetina (10.1016/j.knosys.2024.112312_b18) 2023; 56
Dalal (10.1016/j.knosys.2024.112312_b35) 2018
Zamfirache (10.1016/j.knosys.2024.112312_b10) 2023; 21
Brunton (10.1016/j.knosys.2024.112312_b12) 2016; 113
Kiran (10.1016/j.knosys.2024.112312_b19) 2021; 23
Korupolu (10.1016/j.knosys.2024.112312_b26) 2011
Schulman (10.1016/j.knosys.2024.112312_b45) 2017
Yao (10.1016/j.knosys.2024.112312_b30) 2024
Heger (10.1016/j.knosys.2024.112312_b23) 1994
Dave (10.1016/j.knosys.2024.112312_b43) 2023; 182
Mnih (10.1016/j.knosys.2024.112312_b16) 2013
Khan (10.1016/j.knosys.2024.112312_b20) 2020; 8
Kingma (10.1016/j.knosys.2024.112312_b49) 2017
Tessler (10.1016/j.knosys.2024.112312_b48) 2018
Patek (10.1016/j.knosys.2024.112312_b24) 2001; 37
Sun (10.1016/j.knosys.2024.112312_b6) 2021; 7
Schulman (10.1016/j.knosys.2024.112312_b50) 2015
Qian (10.1016/j.knosys.2024.112312_b8) 2022; 152
Silver (10.1016/j.knosys.2024.112312_b17) 2017; 550
De Lellis (10.1016/j.knosys.2024.112312_b31) 2024
Zhang (10.1016/j.knosys.2024.112312_b33) 2022
Precup (10.1016/j.knosys.2024.112312_b4) 2021
Altman (10.1016/j.knosys.2024.112312_b28) 1999
Liang (10.1016/j.knosys.2024.112312_b47) 2018
Huang (10.1016/j.knosys.2024.112312_b34) 2023
Laud (10.1016/j.knosys.2024.112312_b52) 2004
Ibarz (10.1016/j.knosys.2024.112312_b15) 2021; 40
10.1016/j.knosys.2024.112312_b27
10.1016/j.knosys.2024.112312_b25
Yadav (10.1016/j.knosys.2024.112312_b21) 2021; 21
Viswanathan (10.1016/j.knosys.2024.112312_b40) 2000; 122
Roman (10.1016/j.knosys.2024.112312_b3) 2024; 27
Brunke (10.1016/j.knosys.2024.112312_b11) 2021
Ouyang (10.1016/j.knosys.2024.112312_b46) 2022; 35
10.1016/j.knosys.2024.112312_b22
Park (10.1016/j.knosys.2024.112312_b5) 2022; 145
Lin (10.1016/j.knosys.2024.112312_b9) 2020; 7
McCombie (10.1016/j.knosys.2024.112312_b37) 2016; 96
Hu (10.1016/j.knosys.2024.112312_b42) 2020
Wu (10.1016/j.knosys.2024.112312_b32) 2023
Administration (10.1016/j.knosys.2024.112312_b38) 2022
Brunton (10.1016/j.knosys.2024.112312_b44) 2016; 49
Sutton (10.1016/j.knosys.2024.112312_b13) 2018
Zhang (10.1016/j.knosys.2024.112312_b2) 2005
References_xml – volume: 8
  start-page: 176598
  year: 2020
  end-page: 176623
  ident: b20
  article-title: A systematic review on reinforcement learning-based robotics within the last decade
  publication-title: IEEE Access
– reference: A. Tamar, D. Di Castro, S. Mannor, Policy gradients with variance related risk criteria, in: Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012, pp. 387–396.
– volume: 122
  start-page: 246
  year: 2000
  end-page: 255
  ident: b40
  article-title: Failure mechanisms of high temperature components in power plants
  publication-title: J. Eng. Mater. Technol.
– year: 2017
  ident: b41
  article-title: SAM Theory Manual
– volume: 49
  start-page: 710
  year: 2016
  end-page: 715
  ident: b44
  article-title: Sparse identification of nonlinear dynamics with control (SINDYc)
  publication-title: IFAC-PapersOnLine
– year: 2021
  ident: b4
  article-title: Data-Driven Model-Free Controllers
– start-page: 105
  year: 1994
  end-page: 111
  ident: b23
  article-title: Consideration of risk in reinforcement learning
  publication-title: Machine Learning Proceedings 1994
– volume: 222
  start-page: 872
  year: 2018
  end-page: 884
  ident: b39
  article-title: The benefits of nuclear flexibility in power system operations with renewable energy
  publication-title: Appl. Energy
– year: 2018
  ident: b7
  article-title: Risk-sensitive reinforcement learning: A constrained optimization viewpoint
– volume: 145
  year: 2022
  ident: b5
  article-title: Control automation in the heat-up mode of a nuclear power plant using reinforcement learning
  publication-title: Prog. Nucl. Energy
– year: 2024
  ident: b31
  article-title: Guaranteeing control requirements via reward shaping in reinforcement learning
  publication-title: IEEE Trans. Control Syst. Technol.
– year: 2018
  ident: b36
  article-title: OptLayer - Practical constrained optimization for deep reinforcement learning in the real world
– year: 2004
  ident: b52
  article-title: Theory and Application of Reward Shaping in Reinforcement Learning
– reference: C. Gehring, D. Precup, Smart exploration in reinforcement learning using absolute temporal difference errors, in: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems, 2013, pp. 1037–1044.
– year: 2018
  ident: b48
  article-title: Reward constrained policy optimization
– volume: 40
  start-page: 698
  year: 2021
  end-page: 721
  ident: b15
  article-title: How to train your robot with deep reinforcement learning: lessons we have learned
  publication-title: Int. J. Robot. Res.
– year: 2018
  ident: b13
  article-title: Reinforcement Learning: An Introduction
– reference: J. Garcıa, F. Fernandez, A Comprehensive Survey on Safe Reinforcement Learning.
– volume: 96
  start-page: 758
  year: 2016
  end-page: 769
  ident: b37
  article-title: Renewable and nuclear electricity: Comparison of environmental impacts
  publication-title: Energy Policy
– volume: 32
  start-page: 1238
  year: 2013
  end-page: 1274
  ident: b14
  article-title: Reinforcement learning in robotics: A survey
  publication-title: Int. J. Robot. Res.
– volume: 7
  start-page: 1239
  year: 2021
  end-page: 1247
  ident: b6
  article-title: Machine learning and data-driven techniques for the control of smart power generation systems: An uncertainty handling perspective
  publication-title: Engineering
– year: 2022
  ident: b38
  article-title: Cost and performance characteristics of new generating technologies, annual energy outlook 2022
– year: 2020
  ident: b42
  article-title: Development of a Reference Model for Molten-Salt-Cooled Pebble-Bed Reactor Using SAM
– year: 2023
  ident: b34
  article-title: Safe dreamerv3: Safe reinforcement learning with world models
– volume: 550
  start-page: 354
  year: 2017
  end-page: 359
  ident: b17
  article-title: Mastering the game of go without human knowledge
  publication-title: Nature
– start-page: 319
  year: 2005
  end-page: 324
  ident: b2
  article-title: An embedded real-time neuro-fuzzy controller for mobile robot navigation
  publication-title: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ ’05.
– year: 1999
  ident: b28
  publication-title: Constrained Markov Decision Processes
– volume: 35
  start-page: 27730
  year: 2022
  end-page: 27744
  ident: b46
  article-title: Training language models to follow instructions with human feedback
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2017
  ident: b49
  article-title: Adam: A method for stochastic optimization
– year: 2013
  ident: b16
  article-title: Playing atari with deep reinforcement learning
– year: 2018
  ident: b51
  article-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
– volume: 152
  year: 2022
  ident: b8
  article-title: Development of deep reinforcement learning-based fault diagnosis method for rotating machinery in nuclear power plants
  publication-title: Prog. Nucl. Energy
– year: 2018
  ident: b35
  article-title: Safe exploration in continuous action spaces
– volume: 27
  start-page: 3
  year: 2024
  end-page: 17
  ident: b3
  article-title: Hybrid data-driven active disturbance rejection sliding mode control with tower crane systems validation
  publication-title: Sci. Technol.
– year: 2017
  ident: b29
  article-title: Constrained policy optimization
– volume: 21
  start-page: 24910
  year: 2021
  end-page: 24918
  ident: b21
  article-title: Smart healthcare: RL-based task offloading scheme for edge-enable sensor networks
  publication-title: IEEE Sens. J.
– volume: 23
  start-page: 4909
  year: 2021
  end-page: 4926
  ident: b19
  article-title: Deep reinforcement learning for autonomous driving: A survey
  publication-title: IEEE Trans. Intell. Transp. Syst.
– volume: 37
  start-page: 1379
  year: 2001
  end-page: 1386
  ident: b24
  article-title: On terminating Markov decision processes with a risk-averse objective function
  publication-title: Automatica
– volume: 7
  start-page: 6288
  year: 2020
  end-page: 6301
  ident: b9
  article-title: Deep reinforcement learning for economic dispatch of virtual power plant in internet of energy
  publication-title: IEEE Internet Things J.
– year: 2017
  ident: b45
  article-title: Proximal policy optimization algorithms
– year: 2018
  ident: b47
  article-title: Accelerated primal-dual policy optimization for safe reinforcement learning
– volume: 113
  start-page: 3932
  year: 2016
  end-page: 3937
  ident: b12
  article-title: Discovering governing equations from data by sparse identification of nonlinear dynamical systems
  publication-title: Proc. Natl. Acad. Sci.
– volume: 56
  start-page: 1543
  year: 2023
  end-page: 1575
  ident: b18
  article-title: Survey on reinforcement learning for language processing
  publication-title: Artif. Intell. Rev.
– year: 2022
  ident: b33
  article-title: Safe reinforcement learning with contrastive risk prediction
– start-page: 1
  year: 2023
  end-page: 6
  ident: b32
  article-title: Risk-aware reward shaping of reinforcement learning agents for autonomous driving
  publication-title: IECON 2023-49th Annual Conference of the IEEE Industrial Electronics Society
– year: 2024
  ident: b30
  article-title: Constraint-conditioned policy optimization for versatile safe reinforcement learning
– start-page: 1245
  year: 2018
  end-page: 1260
  ident: b1
  article-title: 37 - Artificial neural network applications in power electronics and electric drives
  publication-title: Power Electronics Handbook (Fourth Edition)
– year: 2021
  ident: b11
  article-title: Safe learning in robotics: From learning-based control to safe reinforcement learning
– volume: 21
  start-page: 615
  year: 2023
  end-page: 630
  ident: b10
  article-title: Q-Learning, policy iteration and actor-critic reinforcement learning combined with metaheuristic algorithms in servo system control
  publication-title: Facta Univ. Ser.: Mech. Eng.
– volume: 182
  year: 2023
  ident: b43
  article-title: Design of a supervisory control system for autonomous operation of advanced reactors
  publication-title: Ann. Nucl. Energy
– year: 2011
  ident: b26
  article-title: Beyond rewards : Learning from richer supervision
– year: 2015
  ident: b50
  article-title: High-dimensional continuous control using generalized advantage estimation
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b13
– volume: 152
  year: 2022
  ident: 10.1016/j.knosys.2024.112312_b8
  article-title: Development of deep reinforcement learning-based fault diagnosis method for rotating machinery in nuclear power plants
  publication-title: Prog. Nucl. Energy
  doi: 10.1016/j.pnucene.2022.104401
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b36
– volume: 40
  start-page: 698
  issue: 4–5
  year: 2021
  ident: 10.1016/j.knosys.2024.112312_b15
  article-title: How to train your robot with deep reinforcement learning: lessons we have learned
  publication-title: Int. J. Robot. Res.
  doi: 10.1177/0278364920987859
– year: 2011
  ident: 10.1016/j.knosys.2024.112312_b26
– year: 2022
  ident: 10.1016/j.knosys.2024.112312_b33
– year: 2017
  ident: 10.1016/j.knosys.2024.112312_b49
– volume: 182
  year: 2023
  ident: 10.1016/j.knosys.2024.112312_b43
  article-title: Design of a supervisory control system for autonomous operation of advanced reactors
  publication-title: Ann. Nucl. Energy
  doi: 10.1016/j.anucene.2022.109593
– year: 1999
  ident: 10.1016/j.knosys.2024.112312_b28
– year: 2015
  ident: 10.1016/j.knosys.2024.112312_b50
– volume: 122
  start-page: 246
  issue: 3
  year: 2000
  ident: 10.1016/j.knosys.2024.112312_b40
  article-title: Failure mechanisms of high temperature components in power plants
  publication-title: J. Eng. Mater. Technol.
  doi: 10.1115/1.482794
– volume: 35
  start-page: 27730
  year: 2022
  ident: 10.1016/j.knosys.2024.112312_b46
  article-title: Training language models to follow instructions with human feedback
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 7
  start-page: 6288
  issue: 7
  year: 2020
  ident: 10.1016/j.knosys.2024.112312_b9
  article-title: Deep reinforcement learning for economic dispatch of virtual power plant in internet of energy
  publication-title: IEEE Internet Things J.
  doi: 10.1109/JIOT.2020.2966232
– year: 2022
  ident: 10.1016/j.knosys.2024.112312_b38
– year: 2020
  ident: 10.1016/j.knosys.2024.112312_b42
– ident: 10.1016/j.knosys.2024.112312_b27
– volume: 96
  start-page: 758
  year: 2016
  ident: 10.1016/j.knosys.2024.112312_b37
  article-title: Renewable and nuclear electricity: Comparison of environmental impacts
  publication-title: Energy Policy
  doi: 10.1016/j.enpol.2016.03.022
– volume: 222
  start-page: 872
  year: 2018
  ident: 10.1016/j.knosys.2024.112312_b39
  article-title: The benefits of nuclear flexibility in power system operations with renewable energy
  publication-title: Appl. Energy
  doi: 10.1016/j.apenergy.2018.03.002
– volume: 27
  start-page: 3
  year: 2024
  ident: 10.1016/j.knosys.2024.112312_b3
  article-title: Hybrid data-driven active disturbance rejection sliding mode control with tower crane systems validation
  publication-title: Sci. Technol.
– volume: 21
  start-page: 615
  issue: 4
  year: 2023
  ident: 10.1016/j.knosys.2024.112312_b10
  article-title: Q-Learning, policy iteration and actor-critic reinforcement learning combined with metaheuristic algorithms in servo system control
  publication-title: Facta Univ. Ser.: Mech. Eng.
– year: 2017
  ident: 10.1016/j.knosys.2024.112312_b45
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b47
– ident: 10.1016/j.knosys.2024.112312_b22
– start-page: 1245
  year: 2018
  ident: 10.1016/j.knosys.2024.112312_b1
  article-title: 37 - Artificial neural network applications in power electronics and electric drives
– volume: 145
  year: 2022
  ident: 10.1016/j.knosys.2024.112312_b5
  article-title: Control automation in the heat-up mode of a nuclear power plant using reinforcement learning
  publication-title: Prog. Nucl. Energy
  doi: 10.1016/j.pnucene.2021.104107
– volume: 21
  start-page: 24910
  issue: 22
  year: 2021
  ident: 10.1016/j.knosys.2024.112312_b21
  article-title: Smart healthcare: RL-based task offloading scheme for edge-enable sensor networks
  publication-title: IEEE Sens. J.
  doi: 10.1109/JSEN.2021.3096245
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b51
– year: 2004
  ident: 10.1016/j.knosys.2024.112312_b52
– year: 2024
  ident: 10.1016/j.knosys.2024.112312_b31
  article-title: Guaranteeing control requirements via reward shaping in reinforcement learning
  publication-title: IEEE Trans. Control Syst. Technol.
  doi: 10.1109/TCST.2024.3393210
– year: 2021
  ident: 10.1016/j.knosys.2024.112312_b4
– year: 2024
  ident: 10.1016/j.knosys.2024.112312_b30
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b48
– volume: 8
  start-page: 176598
  year: 2020
  ident: 10.1016/j.knosys.2024.112312_b20
  article-title: A systematic review on reinforcement learning-based robotics within the last decade
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3027152
– start-page: 1
  year: 2023
  ident: 10.1016/j.knosys.2024.112312_b32
  article-title: Risk-aware reward shaping of reinforcement learning agents for autonomous driving
– volume: 37
  start-page: 1379
  issue: 9
  year: 2001
  ident: 10.1016/j.knosys.2024.112312_b24
  article-title: On terminating Markov decision processes with a risk-averse objective function
  publication-title: Automatica
  doi: 10.1016/S0005-1098(01)00084-X
– year: 2017
  ident: 10.1016/j.knosys.2024.112312_b29
– year: 2021
  ident: 10.1016/j.knosys.2024.112312_b11
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b7
– volume: 32
  start-page: 1238
  issue: 11
  year: 2013
  ident: 10.1016/j.knosys.2024.112312_b14
  article-title: Reinforcement learning in robotics: A survey
  publication-title: Int. J. Robot. Res.
  doi: 10.1177/0278364913495721
– volume: 23
  start-page: 4909
  issue: 6
  year: 2021
  ident: 10.1016/j.knosys.2024.112312_b19
  article-title: Deep reinforcement learning for autonomous driving: A survey
  publication-title: IEEE Trans. Intell. Transp. Syst.
  doi: 10.1109/TITS.2021.3054625
– start-page: 105
  year: 1994
  ident: 10.1016/j.knosys.2024.112312_b23
  article-title: Consideration of risk in reinforcement learning
– year: 2013
  ident: 10.1016/j.knosys.2024.112312_b16
– volume: 113
  start-page: 3932
  issue: 15
  year: 2016
  ident: 10.1016/j.knosys.2024.112312_b12
  article-title: Discovering governing equations from data by sparse identification of nonlinear dynamical systems
  publication-title: Proc. Natl. Acad. Sci.
  doi: 10.1073/pnas.1517384113
– start-page: 319
  year: 2005
  ident: 10.1016/j.knosys.2024.112312_b2
  article-title: An embedded real-time neuro-fuzzy controller for mobile robot navigation
– volume: 49
  start-page: 710
  issue: 18
  year: 2016
  ident: 10.1016/j.knosys.2024.112312_b44
  article-title: Sparse identification of nonlinear dynamics with control (SINDYc)
  publication-title: IFAC-PapersOnLine
  doi: 10.1016/j.ifacol.2016.10.249
– year: 2023
  ident: 10.1016/j.knosys.2024.112312_b34
– volume: 550
  start-page: 354
  issue: 7676
  year: 2017
  ident: 10.1016/j.knosys.2024.112312_b17
  article-title: Mastering the game of go without human knowledge
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: 10.1016/j.knosys.2024.112312_b25
– year: 2017
  ident: 10.1016/j.knosys.2024.112312_b41
– volume: 56
  start-page: 1543
  issue: 2
  year: 2023
  ident: 10.1016/j.knosys.2024.112312_b18
  article-title: Survey on reinforcement learning for language processing
  publication-title: Artif. Intell. Rev.
  doi: 10.1007/s10462-022-10205-5
– year: 2018
  ident: 10.1016/j.knosys.2024.112312_b35
– volume: 7
  start-page: 1239
  issue: 9
  year: 2021
  ident: 10.1016/j.knosys.2024.112312_b6
  article-title: Machine learning and data-driven techniques for the control of smart power generation systems: An uncertainty handling perspective
  publication-title: Engineering
  doi: 10.1016/j.eng.2021.04.020
SSID ssj0002218
Score 2.4309916
Snippet Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to...
SourceID osti
crossref
elsevier
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 112312
SubjectTerms Constrained optimization
Data-driven control
Power plants
Safe reinforcement learning
Title A safe reinforcement learning algorithm for supervisory control of power plants
URI https://dx.doi.org/10.1016/j.knosys.2024.112312
https://www.osti.gov/servlets/purl/2588109
Volume 301
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  issn: 0950-7051
  databaseCode: GBLVA
  dateStart: 20110101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0002218
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect
  issn: 0950-7051
  databaseCode: .~1
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0002218
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  issn: 0950-7051
  databaseCode: ACRLP
  dateStart: 19950201
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0002218
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection
  issn: 0950-7051
  databaseCode: AIKHN
  dateStart: 19950201
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0002218
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  issn: 0950-7051
  databaseCode: AKRWK
  dateStart: 19871201
  customDbUrl:
  isFulltext: true
  mediaType: online
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002218
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqsrDwRpRC5YHVNI6d11hVVAVEGaBSt8hN7FIoSZSkQxd-O77E4TGgSgwZkvii6Ow7n-3vvkPoSs1dKqmQJFJMER5bighOFbFE7AZAKOZUDHwPE3c85XczZ9ZCwyYXBmCVxvfXPr3y1uZJ32izny2X_ScdHOjxqicsDvA4FxJ-OfegisH1xzfMw7arPT5oTKB1kz5XYbzekrTYAGm3zSGXhlH7r-mpnWqL-zHzjA7QngkZ8aD-q0PUkskR2m_KMWBjncfocYALoSTOZcWGGlUbf9iUhVhgsVqk-bJ8ecf6JS7WGXiJIs032MDVcapwBkXTcLYCdMwJmo5unodjYuolkIgxVhI4ApOx8OBoL1BQRCMS3tyzmbSU0JYbUGEHEGIx5sY80ktT1489KvVFpRsLdoraSZrIM4SlzyIquANUO1xoG_W4XqipWEcbvg44RAexRk1hZMjEoabFKmxQY69hrdwQlBvWyu0g8iWV1WQaW9p7TQ-EvwZFqP39FskudBhIARduBKAhLWY7vk-t4Pzf3-2iXbir0HzBBWqX-Vpe6qiknPeqYddDO4Pb-_HkE0qF4NU
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZKGWDhjSjl4YHVtI6d11hVVAXaMtBK3SI3sUuhNFGaDl347fgSh8eAKjFkiX1RdPadz_Z33yF0oyYOlVRIEiqmCI-aighOFWmKyPGBUMzOGfj6A6c74g9je1xB7TIXBmCVxvcXPj331uZNw2izkcxmjWcdHOj5qhcsDvA4x9tC29y2XNiB3X584zwsKz_kg94Eupf5cznI620RL9fA2m1xSKZh1PprfarG2uR-LD2dA7RnYkbcKn7rEFXk4gjtl_UYsDHPY_TUwkuhJE5lToca5id_2NSFmGIxn8bpLHt5x7oRL1cJuIllnK6xwavjWOEEqqbhZA7wmBM06twN211iCiaQkDGWEbgDk5Fw4W7PV1BFIxTuxLWYbCqhTdenwvIhxmLMiXio96aOF7lU6odKJxLsFFUX8UKeISw9FlLBbeDa4UIbqcv1Tk1FOtzwdMQhaoiVagpCwyYORS3mQQkbew0K5Qag3KBQbg2RL6mkYNPY0N8tRyD4NSsC7fA3SNZhwEAKyHBDQA1pMcv2PNr0z__93Wu00x32e0HvfvBYR7vQkkP7_AtUzdKVvNQhSja5yqfgJ5hk4mo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+safe+reinforcement+learning+algorithm+for+supervisory+control+of+power+plants&rft.jtitle=Knowledge-based+systems&rft.au=Sun%2C+Yixuan&rft.au=Khairy%2C+Sami&rft.au=Vilim%2C+Richard+B.&rft.au=Hu%2C+Rui&rft.date=2024-10-09&rft.pub=Elsevier+B.V&rft.issn=0950-7051&rft.volume=301&rft_id=info:doi/10.1016%2Fj.knosys.2024.112312&rft.externalDocID=S0950705124009468
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-7051&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-7051&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-7051&client=summon