Improved Demonstration-Knowledge Utilization in Reinforcement Learning

Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually pr...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on artificial intelligence Vol. 5; no. 5; pp. 2139 - 2150
Main Authors Liu, Yanyu, Zeng, Yifeng, Ma, Biyang, Pan, Yinghui, Gao, Huifan, Zhang, Yuting
Format Journal Article
LanguageEnglish
Published IEEE 01.05.2024
Subjects
Online AccessGet full text
ISSN2691-4581
2691-4581
DOI10.1109/TAI.2023.3328848

Cover

Abstract Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains.
AbstractList Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains.
Author Ma, Biyang
Zhang, Yuting
Pan, Yinghui
Zeng, Yifeng
Gao, Huifan
Liu, Yanyu
Author_xml – sequence: 1
  givenname: Yanyu
  orcidid: 0009-0002-8924-592X
  surname: Liu
  fullname: Liu, Yanyu
  organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China
– sequence: 2
  givenname: Yifeng
  orcidid: 0000-0002-5246-403X
  surname: Zeng
  fullname: Zeng, Yifeng
  email: yifeng.zeng@northumbria.ac.uk
  organization: Department of Computer and Information Sciences, Northumbria University, Newcastle, U.K
– sequence: 3
  givenname: Biyang
  orcidid: 0000-0003-1515-6449
  surname: Ma
  fullname: Ma, Biyang
  organization: Department of Computer Science, Minnan Normal University, Zhangzhou, China
– sequence: 4
  givenname: Yinghui
  orcidid: 0000-0001-5715-2855
  surname: Pan
  fullname: Pan, Yinghui
  email: panyinghui@szu.edu.cn
  organization: National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
– sequence: 5
  givenname: Huifan
  orcidid: 0000-0002-8624-1301
  surname: Gao
  fullname: Gao, Huifan
  organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China
– sequence: 6
  givenname: Yuting
  orcidid: 0009-0002-9675-5987
  surname: Zhang
  fullname: Zhang, Yuting
  organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China
BookMark eNpNkE1PAjEQhhuDiYjcPXjYP7DrdLof7ZGgKJHExMB5021nSQ3bku5Go79eEA6cZjLv-8zhuWUjHzwxds8h4xzU43q2zBBQZEKglLm8YmMsFU_zQvLRxX7Dpn3_CQBYcESsxmyx7PYxfJFNnqgLvh-iHlzw6ZsP3zuyW0o2g9u53_9r4nzyQc63IRrqyA_JinT0zm_v2HWrdz1Nz3PCNovn9fw1Xb2_LOezVWq4wCGtSiqwQWmlEVYhabCyUaUCrW2luZCH3CjRGqmwUk3JlQClOWBemLy1XEwYnP6aGPo-Ulvvo-t0_Kk51EcV9UFFfVRRn1UckIcT4ojooi5A5pCLPxx0XG0
CODEN ITAICB
Cites_doi 10.24963/ijcai.2017/422
10.1016/j.neucom.2020.02.008
10.1002/9781118771075
10.1109/TNNLS.2021.3082568
10.1109/LRA.2018.2801479
10.1214/aoms/1177729694
10.1016/j.artint.2014.07.003
10.1609/aaai.v32i1.11757
10.1109/IJCNN.2017.7965896
10.1007/s10462-021-10061-9
10.1016/j.eswa.2018.09.036
10.1145/3409501.3409517
10.1038/s41586-020-2939-8
10.26599/TST.2021.9010012
10.1007/978-3-030-01234-2_36
10.1109/IROS51168.2021.9636020
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TAI.2023.3328848
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2691-4581
EndPage 2150
ExternalDocumentID 10_1109_TAI_2023_3328848
10308404
Genre orig-research
GrantInformation_xml – fundername: Natural Science Foundation of Fujian Province, China
  grantid: 2022J05176
– fundername: National Natural Science Foundation of China
  grantid: 62176225; 62276168; 61836005
  funderid: 10.13039/501100001809
– fundername: Guangdong Province, China
  grantid: 2023A1515010869
GroupedDBID 0R~
97E
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
IEDLZ
IFIPE
JAVBF
M~E
OCL
RIA
RIE
AAYXX
CITATION
ID FETCH-LOGICAL-c132t-76e52b28d8c3d92ea0d8b9690aad7a1386e5c93fc89279b619309a10245c4fd13
IEDL.DBID RIE
ISSN 2691-4581
IngestDate Wed Oct 01 05:36:57 EDT 2025
Wed Aug 27 07:40:20 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c132t-76e52b28d8c3d92ea0d8b9690aad7a1386e5c93fc89279b619309a10245c4fd13
ORCID 0000-0001-5715-2855
0000-0002-8624-1301
0000-0003-1515-6449
0009-0002-8924-592X
0000-0002-5246-403X
0009-0002-9675-5987
PageCount 12
ParticipantIDs ieee_primary_10308404
crossref_primary_10_1109_TAI_2023_3328848
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-May
2024-5-00
PublicationDateYYYYMMDD 2024-05-01
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-May
PublicationDecade 2020
PublicationTitle IEEE transactions on artificial intelligence
PublicationTitleAbbrev TAI
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref12
Mnih (ref14) 2013
ref15
ref37
Zhang (ref6) 2021; 2021
ref31
ref30
ref11
Ross (ref34) 2011
ref2
ref1
Boyan (ref36) 2000; 13
ref17
Garg (ref16) 2021; 34
ref18
GrzeŚ (ref22) 2017
Schaal (ref4) 1996
Eysenbach (ref21) 2018
ref24
ref23
ref25
Vecerik (ref26) 2017
Badia (ref35) 2020
Brockman (ref33) 2016
ref28
Bellemare (ref9) 2017
ref27
Che (ref29) 2022
Kantorovich (ref32) 1939; 6
Pertsch (ref5) 2022
Schwarzer (ref19) 2021; 34
Dearden (ref8) 1998; 1998
Wang (ref3) 2021
Barth-Maron (ref10) 2018
Liu (ref20) 2021; 34
Rengarajan (ref7) 2022
References_xml – ident: ref28
  doi: 10.24963/ijcai.2017/422
– year: 2018
  ident: ref10
  article-title: Distributed distributional deterministic policy gradients
– start-page: 565
  volume-title: Proc. 16th Conf. Auton. Agents MultiAgent Syst.
  year: 2017
  ident: ref22
  article-title: Reward shaping in episodic reinforcement learning
– volume: 13
  start-page: 982
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2000
  ident: ref36
  article-title: Exact solutions to time-dependent MDPS
– volume: 34
  start-page: 12686
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2021
  ident: ref19
  article-title: Pretraining representations for data-efficient reinforcement learning
– ident: ref27
  doi: 10.1016/j.neucom.2020.02.008
– start-page: 729
  volume-title: Proc. Conf. Robot Learn.
  year: 2022
  ident: ref5
  article-title: Guided reinforcement learning with learned skills
– ident: ref30
  doi: 10.1002/9781118771075
– volume: 34
  start-page: 18459
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2021
  ident: ref20
  article-title: Behavior from the void: Unsupervised active pre-training
– volume: 6
  start-page: 363
  issue: 4
  year: 1939
  ident: ref32
  article-title: The mathematical method of production planning and organization
  publication-title: Manage. Sci.
– year: 2020
  ident: ref35
  article-title: Never Give Up: Learning directed exploration strategies
– year: 2016
  ident: ref33
  article-title: OpenAI gym
– ident: ref11
  doi: 10.1109/TNNLS.2021.3082568
– year: 2017
  ident: ref26
  article-title: Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards
– volume: 2021
  start-page: 7588221
  volume-title: Comput. Intell. Neuroscience
  year: 2021
  ident: ref6
  article-title: Efficient reinforcement learning from demonstration via Bayesian network-based knowledge extraction
– volume: 34
  start-page: 4028
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2021
  ident: ref16
  article-title: LQ-learn: Inverse soft-Q learning for imitation
– ident: ref37
  doi: 10.1109/LRA.2018.2801479
– ident: ref31
  doi: 10.1214/aoms/1177729694
– start-page: 627
  volume-title: Proc. 14th Int. Conf. Artif. Intell. Statist.
  year: 2011
  ident: ref34
  article-title: A reduction of imitation learning and structured prediction to no-regret online learning
– ident: ref17
  doi: 10.1016/j.artint.2014.07.003
– ident: ref25
  doi: 10.1609/aaai.v32i1.11757
– ident: ref23
  doi: 10.1109/IJCNN.2017.7965896
– start-page: 1040
  volume-title: Proc. 9th Int. Conf. Neural Inf. Process. Syst.
  year: 1996
  ident: ref4
  article-title: Learning from demonstration
– ident: ref1
  doi: 10.1007/s10462-021-10061-9
– start-page: 449
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2017
  ident: ref9
  article-title: A distributional perspective on reinforcement learning
– year: 2022
  ident: ref29
  article-title: Bayesian Q-learning with imperfect expert demonstrations
– ident: ref15
  doi: 10.1016/j.eswa.2018.09.036
– ident: ref12
  doi: 10.1145/3409501.3409517
– ident: ref13
  doi: 10.1038/s41586-020-2939-8
– start-page: 10905
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2021
  ident: ref3
  article-title: SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II
– ident: ref2
  doi: 10.26599/TST.2021.9010012
– volume: 1998
  start-page: 761
  volume-title: Proc. AAAI/IAAI
  year: 1998
  ident: ref8
  article-title: Bayesian Q-learning
– ident: ref18
  doi: 10.1007/978-3-030-01234-2_36
– ident: ref24
  doi: 10.1109/IROS51168.2021.9636020
– year: 2022
  ident: ref7
  article-title: Reinforcement learning with sparse rewards using guidance from offline demonstration
– year: 2013
  ident: ref14
  article-title: Playing atari with deep reinforcement learning
– year: 2018
  ident: ref21
  article-title: Diversity is all you need: Learning skills without a reward function
SSID ssj0002512227
Score 2.2571924
Snippet Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 2139
SubjectTerms Bayes methods
Gaussian distribution
Heuristic algorithms
Learning from demonstration
Merging
Q-learning
reinforcement learning
Task analysis
Vehicle dynamics
Title Improved Demonstration-Knowledge Utilization in Reinforcement Learning
URI https://ieeexplore.ieee.org/document/10308404
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2691-4581
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002512227
  issn: 2691-4581
  databaseCode: RIE
  dateStart: 20200101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2691-4581
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002512227
  issn: 2691-4581
  databaseCode: M~E
  dateStart: 20200101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA9uJy_Ox8T5GDl48ZCubdImOQ7dmMp2kA12K2keMtRNpLt48G83j1aGIHgrbVrC90v53t8PgGvKU4mtZYFYJgkiCU2QyLBBzBDJtBBZJlyj8HSWTxbkYZkt62Z13wujtfbFZzpylz6XrzZy60JlA0eJZR0S0gItyvLQrPUTUHGKOk1pk4qM-WA-vI8cO3iEccqYY_jZUT07XCpelYw7YNZsIlSQvETbqozk56_5jP_e5SE4qI1KOAyn4Ajs6fUx6DSEDbD-f0_AOIQQtIJ3-s0ZhgF-9NgE1uCiWr3WnZlwtYZP2g9WlT6GCOtZrM9dsBiP5rcTVBMpIGmdzQrRXGdpmTLFJFY81SJWrOTWLxZCUZFgZp9Ljo1kPKW8tD4VjrlIXFZWEqMSfAra681anwGoShOTROf2OznJMyO4MRZPi6y1G0pBe-CmkXHxHuZlFN7PiHlh8SgcHkWNRw90nfR21gXBnf9x_wLs29dJKDe8BO3qY6uvrElQlX3Qmn6N-v5AfAPIbrXt
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA86D3pxPnE-c_DiIV2bR5schzo29zjIBruVNEllqJ1Id_GvN2laGYLgLTQhhO_7yvf-fgDcJgIrYi0LxJmiiEZJhCQjOeI5VdxIyZh0jcKTaTyY06cFW9TN6lUvjDGmKj4zgVtWuXy9UmsXKus6SCzrkNBtsMMopcy3a_2EVJyqxjhpkpGh6M56w8DhgweEYM4dxs-G8tlAU6mUSb8Nps0zfA3Ja7Aus0B9_ZrQ-O93HoD92qyEPS8Hh2DLFEeg3UA2wPoPPgZ9H0QwGj6Yd2caegFAoya0Bufl8q3uzYTLAj6barSqqqKIsJ7G-nIC5v3H2f0A1VAKSFl3s0RJbBjOMNdcES2wkaHmmbCesZQ6kRHhdl8JkisucCIy61WRUMjI5WUVzXVETkGrWBXmDECd5SGNTGzviWnMciny3HLU8tZaDplMOuCuoXH64SdmpJWnEYrU8iN1_EhrfnTAiaPexjlPuPM_vt-A3cFsMk7Hw-noAuzZq6gvPrwErfJzba6sgVBm15VYfAMtBrgJ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improved+Demonstration-Knowledge+Utilization+in+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+artificial+intelligence&rft.au=Liu%2C+Yanyu&rft.au=Zeng%2C+Yifeng&rft.au=Ma%2C+Biyang&rft.au=Pan%2C+Yinghui&rft.date=2024-05-01&rft.pub=IEEE&rft.eissn=2691-4581&rft.volume=5&rft.issue=5&rft.spage=2139&rft.epage=2150&rft_id=info:doi/10.1109%2FTAI.2023.3328848&rft.externalDocID=10308404
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2691-4581&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2691-4581&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2691-4581&client=summon