Improved Demonstration-Knowledge Utilization in Reinforcement Learning

Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually pr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on artificial intelligence Vol. 5; no. 5; pp. 2139 - 2150
Main Authors	Liu, Yanyu, Zeng, Yifeng, Ma, Biyang, Pan, Yinghui, Gao, Huifan, Zhang, Yuting
Format	Journal Article
Language	English
Published	IEEE 01.05.2024
Subjects	Bayes methods Gaussian distribution Heuristic algorithms Learning from demonstration Merging Q-learning reinforcement learning Task analysis Vehicle dynamics
Online Access	Get full text
ISSN	2691-4581 2691-4581
DOI	10.1109/TAI.2023.3328848

Cover

Abstract	Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains.
AbstractList	Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains.
Author	Ma, Biyang Zhang, Yuting Pan, Yinghui Zeng, Yifeng Gao, Huifan Liu, Yanyu
Author_xml	– sequence: 1 givenname: Yanyu orcidid: 0009-0002-8924-592X surname: Liu fullname: Liu, Yanyu organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China – sequence: 2 givenname: Yifeng orcidid: 0000-0002-5246-403X surname: Zeng fullname: Zeng, Yifeng email: yifeng.zeng@northumbria.ac.uk organization: Department of Computer and Information Sciences, Northumbria University, Newcastle, U.K – sequence: 3 givenname: Biyang orcidid: 0000-0003-1515-6449 surname: Ma fullname: Ma, Biyang organization: Department of Computer Science, Minnan Normal University, Zhangzhou, China – sequence: 4 givenname: Yinghui orcidid: 0000-0001-5715-2855 surname: Pan fullname: Pan, Yinghui email: panyinghui@szu.edu.cn organization: National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China – sequence: 5 givenname: Huifan orcidid: 0000-0002-8624-1301 surname: Gao fullname: Gao, Huifan organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China – sequence: 6 givenname: Yuting orcidid: 0009-0002-9675-5987 surname: Zhang fullname: Zhang, Yuting organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China
BookMark	eNpNkE1PAjEQhhuDiYjcPXjYP7DrdLof7ZGgKJHExMB5021nSQ3bku5Go79eEA6cZjLv-8zhuWUjHzwxds8h4xzU43q2zBBQZEKglLm8YmMsFU_zQvLRxX7Dpn3_CQBYcESsxmyx7PYxfJFNnqgLvh-iHlzw6ZsP3zuyW0o2g9u53_9r4nzyQc63IRrqyA_JinT0zm_v2HWrdz1Nz3PCNovn9fw1Xb2_LOezVWq4wCGtSiqwQWmlEVYhabCyUaUCrW2luZCH3CjRGqmwUk3JlQClOWBemLy1XEwYnP6aGPo-Ulvvo-t0_Kk51EcV9UFFfVRRn1UckIcT4ojooi5A5pCLPxx0XG0
CODEN	ITAICB
Cites_doi	10.24963/ijcai.2017/422 10.1016/j.neucom.2020.02.008 10.1002/9781118771075 10.1109/TNNLS.2021.3082568 10.1109/LRA.2018.2801479 10.1214/aoms/1177729694 10.1016/j.artint.2014.07.003 10.1609/aaai.v32i1.11757 10.1109/IJCNN.2017.7965896 10.1007/s10462-021-10061-9 10.1016/j.eswa.2018.09.036 10.1145/3409501.3409517 10.1038/s41586-020-2939-8 10.26599/TST.2021.9010012 10.1007/978-3-030-01234-2_36 10.1109/IROS51168.2021.9636020
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TAI.2023.3328848
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2691-4581
EndPage	2150
ExternalDocumentID	10_1109_TAI_2023_3328848 10308404
Genre	orig-research
GrantInformation_xml	– fundername: Natural Science Foundation of Fujian Province, China grantid: 2022J05176 – fundername: National Natural Science Foundation of China grantid: 62176225; 62276168; 61836005 funderid: 10.13039/501100001809 – fundername: Guangdong Province, China grantid: 2023A1515010869
GroupedDBID	0R~ 97E AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS IEDLZ IFIPE JAVBF M~E OCL RIA RIE AAYXX CITATION
ID	FETCH-LOGICAL-c132t-76e52b28d8c3d92ea0d8b9690aad7a1386e5c93fc89279b619309a10245c4fd13
IEDL.DBID	RIE
ISSN	2691-4581
IngestDate	Wed Oct 01 05:36:57 EDT 2025 Wed Aug 27 07:40:20 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	5
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c132t-76e52b28d8c3d92ea0d8b9690aad7a1386e5c93fc89279b619309a10245c4fd13
ORCID	0000-0001-5715-2855 0000-0002-8624-1301 0000-0003-1515-6449 0009-0002-8924-592X 0000-0002-5246-403X 0009-0002-9675-5987
PageCount	12
ParticipantIDs	ieee_primary_10308404 crossref_primary_10_1109_TAI_2023_3328848
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-May 2024-5-00
PublicationDateYYYYMMDD	2024-05-01
PublicationDate_xml	– month: 05 year: 2024 text: 2024-May
PublicationDecade	2020
PublicationTitle	IEEE transactions on artificial intelligence
PublicationTitleAbbrev	TAI
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref12 Mnih (ref14) 2013 ref15 ref37 Zhang (ref6) 2021; 2021 ref31 ref30 ref11 Ross (ref34) 2011 ref2 ref1 Boyan (ref36) 2000; 13 ref17 Garg (ref16) 2021; 34 ref18 GrzeŚ (ref22) 2017 Schaal (ref4) 1996 Eysenbach (ref21) 2018 ref24 ref23 ref25 Vecerik (ref26) 2017 Badia (ref35) 2020 Brockman (ref33) 2016 ref28 Bellemare (ref9) 2017 ref27 Che (ref29) 2022 Kantorovich (ref32) 1939; 6 Pertsch (ref5) 2022 Schwarzer (ref19) 2021; 34 Dearden (ref8) 1998; 1998 Wang (ref3) 2021 Barth-Maron (ref10) 2018 Liu (ref20) 2021; 34 Rengarajan (ref7) 2022
References_xml	– ident: ref28 doi: 10.24963/ijcai.2017/422 – year: 2018 ident: ref10 article-title: Distributed distributional deterministic policy gradients – start-page: 565 volume-title: Proc. 16th Conf. Auton. Agents MultiAgent Syst. year: 2017 ident: ref22 article-title: Reward shaping in episodic reinforcement learning – volume: 13 start-page: 982 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2000 ident: ref36 article-title: Exact solutions to time-dependent MDPS – volume: 34 start-page: 12686 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref19 article-title: Pretraining representations for data-efficient reinforcement learning – ident: ref27 doi: 10.1016/j.neucom.2020.02.008 – start-page: 729 volume-title: Proc. Conf. Robot Learn. year: 2022 ident: ref5 article-title: Guided reinforcement learning with learned skills – ident: ref30 doi: 10.1002/9781118771075 – volume: 34 start-page: 18459 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref20 article-title: Behavior from the void: Unsupervised active pre-training – volume: 6 start-page: 363 issue: 4 year: 1939 ident: ref32 article-title: The mathematical method of production planning and organization publication-title: Manage. Sci. – year: 2020 ident: ref35 article-title: Never Give Up: Learning directed exploration strategies – year: 2016 ident: ref33 article-title: OpenAI gym – ident: ref11 doi: 10.1109/TNNLS.2021.3082568 – year: 2017 ident: ref26 article-title: Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards – volume: 2021 start-page: 7588221 volume-title: Comput. Intell. Neuroscience year: 2021 ident: ref6 article-title: Efficient reinforcement learning from demonstration via Bayesian network-based knowledge extraction – volume: 34 start-page: 4028 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref16 article-title: LQ-learn: Inverse soft-Q learning for imitation – ident: ref37 doi: 10.1109/LRA.2018.2801479 – ident: ref31 doi: 10.1214/aoms/1177729694 – start-page: 627 volume-title: Proc. 14th Int. Conf. Artif. Intell. Statist. year: 2011 ident: ref34 article-title: A reduction of imitation learning and structured prediction to no-regret online learning – ident: ref17 doi: 10.1016/j.artint.2014.07.003 – ident: ref25 doi: 10.1609/aaai.v32i1.11757 – ident: ref23 doi: 10.1109/IJCNN.2017.7965896 – start-page: 1040 volume-title: Proc. 9th Int. Conf. Neural Inf. Process. Syst. year: 1996 ident: ref4 article-title: Learning from demonstration – ident: ref1 doi: 10.1007/s10462-021-10061-9 – start-page: 449 volume-title: Proc. Int. Conf. Mach. Learn. year: 2017 ident: ref9 article-title: A distributional perspective on reinforcement learning – year: 2022 ident: ref29 article-title: Bayesian Q-learning with imperfect expert demonstrations – ident: ref15 doi: 10.1016/j.eswa.2018.09.036 – ident: ref12 doi: 10.1145/3409501.3409517 – ident: ref13 doi: 10.1038/s41586-020-2939-8 – start-page: 10905 volume-title: Proc. Int. Conf. Mach. Learn. year: 2021 ident: ref3 article-title: SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II – ident: ref2 doi: 10.26599/TST.2021.9010012 – volume: 1998 start-page: 761 volume-title: Proc. AAAI/IAAI year: 1998 ident: ref8 article-title: Bayesian Q-learning – ident: ref18 doi: 10.1007/978-3-030-01234-2_36 – ident: ref24 doi: 10.1109/IROS51168.2021.9636020 – year: 2022 ident: ref7 article-title: Reinforcement learning with sparse rewards using guidance from offline demonstration – year: 2013 ident: ref14 article-title: Playing atari with deep reinforcement learning – year: 2018 ident: ref21 article-title: Diversity is all you need: Learning skills without a reward function
SSID	ssj0002512227
Score	2.2571924
Snippet	Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment...
SourceID	crossref ieee
SourceType	Index Database Publisher
StartPage	2139
SubjectTerms	Bayes methods Gaussian distribution Heuristic algorithms Learning from demonstration Merging Q-learning reinforcement learning Task analysis Vehicle dynamics
Title	Improved Demonstration-Knowledge Utilization in Reinforcement Learning
URI	https://ieeexplore.ieee.org/document/10308404
Volume	5
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2691-4581 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002512227 issn: 2691-4581 databaseCode: RIE dateStart: 20200101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2691-4581 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002512227 issn: 2691-4581 databaseCode: M~E dateStart: 20200101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA9uJy_Ox8T5GDl48ZCubdImOQ7dmMp2kA12K2keMtRNpLt48G83j1aGIHgrbVrC90v53t8PgGvKU4mtZYFYJgkiCU2QyLBBzBDJtBBZJlyj8HSWTxbkYZkt62Z13wujtfbFZzpylz6XrzZy60JlA0eJZR0S0gItyvLQrPUTUHGKOk1pk4qM-WA-vI8cO3iEccqYY_jZUT07XCpelYw7YNZsIlSQvETbqozk56_5jP_e5SE4qI1KOAyn4Ajs6fUx6DSEDbD-f0_AOIQQtIJ3-s0ZhgF-9NgE1uCiWr3WnZlwtYZP2g9WlT6GCOtZrM9dsBiP5rcTVBMpIGmdzQrRXGdpmTLFJFY81SJWrOTWLxZCUZFgZp9Ljo1kPKW8tD4VjrlIXFZWEqMSfAra681anwGoShOTROf2OznJMyO4MRZPi6y1G0pBe-CmkXHxHuZlFN7PiHlh8SgcHkWNRw90nfR21gXBnf9x_wLs29dJKDe8BO3qY6uvrElQlX3Qmn6N-v5AfAPIbrXt
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA86D3pxPnE-c_DiIV2bR5schzo29zjIBruVNEllqJ1Id_GvN2laGYLgLTQhhO_7yvf-fgDcJgIrYi0LxJmiiEZJhCQjOeI5VdxIyZh0jcKTaTyY06cFW9TN6lUvjDGmKj4zgVtWuXy9UmsXKus6SCzrkNBtsMMopcy3a_2EVJyqxjhpkpGh6M56w8DhgweEYM4dxs-G8tlAU6mUSb8Nps0zfA3Ja7Aus0B9_ZrQ-O93HoD92qyEPS8Hh2DLFEeg3UA2wPoPPgZ9H0QwGj6Yd2caegFAoya0Bufl8q3uzYTLAj6barSqqqKIsJ7G-nIC5v3H2f0A1VAKSFl3s0RJbBjOMNdcES2wkaHmmbCesZQ6kRHhdl8JkisucCIy61WRUMjI5WUVzXVETkGrWBXmDECd5SGNTGzviWnMciny3HLU8tZaDplMOuCuoXH64SdmpJWnEYrU8iN1_EhrfnTAiaPexjlPuPM_vt-A3cFsMk7Hw-noAuzZq6gvPrwErfJzba6sgVBm15VYfAMtBrgJ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improved+Demonstration-Knowledge+Utilization+in+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+artificial+intelligence&rft.au=Liu%2C+Yanyu&rft.au=Zeng%2C+Yifeng&rft.au=Ma%2C+Biyang&rft.au=Pan%2C+Yinghui&rft.date=2024-05-01&rft.pub=IEEE&rft.eissn=2691-4581&rft.volume=5&rft.issue=5&rft.spage=2139&rft.epage=2150&rft_id=info:doi/10.1109%2FTAI.2023.3328848&rft.externalDocID=10308404
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2691-4581&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2691-4581&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2691-4581&client=summon