Explorative Policy Optimization for industrial-scale operation of complex process control systems

With the advancement of industrial automation, traditional process control methods increasingly struggle to manage the complex operational demands of industrial-scale chemical processes, particularly in the presence of unmodelled dynamics and high nonlinearity. This paper introduces an advanced rein...

Full description

Saved in:

Bibliographic Details
Published in	Journal of process control Vol. 152; p. 103471
Main Authors	Zhang, Zengjun, Li, Shaoyuan, Yang, Yaru
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.08.2025
Subjects	Explorative policy Industrial optimization Process control Reinforcement learning Industrial optimization Explorative policy Process control Reinforcement learning
Online Access	Get full text
ISSN	0959-1524
DOI	10.1016/j.jprocont.2025.103471

Cover

More Information
Summary:	With the advancement of industrial automation, traditional process control methods increasingly struggle to manage the complex operational demands of industrial-scale chemical processes, particularly in the presence of unmodelled dynamics and high nonlinearity. This paper introduces an advanced reinforcement learning algorithm, Explorative Policy Optimization (EPO), specifically developed to optimize operational strategies, focusing on improving both production yield and product quality in such environments. The core innovation of the EPO algorithm is its exploration network, which dynamically adjusts exploration strategies based on discrepancies between predicted and actual values of state–action pairs, enabling more effective exploration. This approach improves decision-making by providing more accurate outcome assessments in complex and unmodelled conditions. EPO also integrates exploration data into the advantage function, ensuring a balance between exploration and exploitation, which is essential for optimizing performance in dynamic environments that require both safety and adaptability. EPO focuses on global optimization in processes with multiple operating conditions and steady states. It surpasses existing RL methods in overall performance while maintaining acceptable computational costs across a wide range of industrial settings. Its effectiveness and practicality are demonstrated through industrial-scale simulation experiments. •Dynamic exploration in state–action space for adapting to unmodeled dynamics.•Exploration network guides agent to underexplored actions, improving control.•EPO outperforms PPO and SAC in industrial process control under uncertain conditions.•Tuned TD index in GAE balances exploration and exploitation for stable gains.•Tested in penicillin production, EPO improved efficiency, stability, and yield.
ISSN:	0959-1524
DOI:	10.1016/j.jprocont.2025.103471