Improved Demonstration-Knowledge Utilization in Reinforcement Learning

Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually pr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on artificial intelligence Vol. 5; no. 5; pp. 2139 - 2150
Main Authors	Liu, Yanyu, Zeng, Yifeng, Ma, Biyang, Pan, Yinghui, Gao, Huifan, Zhang, Yuting
Format	Journal Article
Language	English
Published	IEEE 01.05.2024
Subjects	Bayes methods Gaussian distribution Heuristic algorithms Learning from demonstration Merging Q-learning reinforcement learning Task analysis Vehicle dynamics
Online Access	Get full text
ISSN	2691-4581 2691-4581
DOI	10.1109/TAI.2023.3328848

Cover

More Information
Summary:	Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains.
ISSN:	2691-4581 2691-4581
DOI:	10.1109/TAI.2023.3328848