Data-Driven Solutions to Mixed H/H Control: A Hamilton-Inequality-Driven Reinforcement Learning Approach
Today's industrial systems have complex and possibly unknown dynamics, and are under the effect of unknown disturbances. This paper presents a model-free reinforcement learning (RL) algorithm for solving the mixed H_{2}/H_{\infty} control design for industrial systems to respond favorably to bo...
Saved in:
| Published in | 2020 IEEE Conference on Control Technology and Applications (CCTA) pp. 340 - 345 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.08.2020
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/CCTA41146.2020.9206320 |
Cover
| Summary: | Today's industrial systems have complex and possibly unknown dynamics, and are under the effect of unknown disturbances. This paper presents a model-free reinforcement learning (RL) algorithm for solving the mixed H_{2}/H_{\infty} control design for industrial systems to respond favorably to both disturbance attenuation and performance requirement specifications, despite uncertainties in dynamics. The mixed H_{2}/H_{\infty} performance optimization is first formulated as a non-zero sum game problem, which results in solving coupled Hamilton-Jacobi (HJ) equations. To solve these coupled HJ equations, a relaxed optimization framework based on a Hamiltonian-driven framework is presented that performs optimization subject to two Hamiltonian-inequalities corresponding to H_{2} and H_{\infty} performances. This allows Sum-of-Square (SOS) programs to be used to find efficient solutions to the problem. An SOS-based iterative algorithm is developed to solve the formulated optimization problem with the constraints represented by the Hamiltonian inequalities. The relation between the original and relaxed H_{2}/H_{\infty} performance optimization is discussed in terms of performance comparison. To obviate the requirement of complete knowledge of the system dynamics, a data-driven reinforcement learning approach is proposed to solve the SOS optimization problem in real-time using only the information of the system trajectories measured during a time interval. Finally, a simulation example is provided to show the effectiveness of the proposed algorithm. |
|---|---|
| DOI: | 10.1109/CCTA41146.2020.9206320 |