Improved Demonstration-Knowledge Utilization in Reinforcement Learning
Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually pr...
Saved in:
| Published in | IEEE transactions on artificial intelligence Vol. 5; no. 5; pp. 2139 - 2150 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
IEEE
01.05.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2691-4581 2691-4581 |
| DOI | 10.1109/TAI.2023.3328848 |
Cover
| Abstract | Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains. |
|---|---|
| AbstractList | Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment before an agent can achieve acceptable performance. This motivates many techniques, such as incorporating prior knowledge which is usually presented as experts' demonstration, and using a probability distribution to represent state-and-action values, to accelerate the learning process. The methods perform well when the prior knowledge is genuinely correct and no much change occurs to the learning environment. However, the requirement is not perfectly realistic in many complex applications. The demonstration knowledge may not reflect the true environment and even be full of noise. In this article, we introduce a dynamic distribution merging method to improve knowledge utilization in a general RL algorithm, namely Q-learning. The new method adapts a normal distribution to represent state-action values and merges the prior and learned knowledge in a discriminative way. We theoretically analyze the new learning method and demonstrate its empirical performance over multiple problem domains. |
| Author | Ma, Biyang Zhang, Yuting Pan, Yinghui Zeng, Yifeng Gao, Huifan Liu, Yanyu |
| Author_xml | – sequence: 1 givenname: Yanyu orcidid: 0009-0002-8924-592X surname: Liu fullname: Liu, Yanyu organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China – sequence: 2 givenname: Yifeng orcidid: 0000-0002-5246-403X surname: Zeng fullname: Zeng, Yifeng email: yifeng.zeng@northumbria.ac.uk organization: Department of Computer and Information Sciences, Northumbria University, Newcastle, U.K – sequence: 3 givenname: Biyang orcidid: 0000-0003-1515-6449 surname: Ma fullname: Ma, Biyang organization: Department of Computer Science, Minnan Normal University, Zhangzhou, China – sequence: 4 givenname: Yinghui orcidid: 0000-0001-5715-2855 surname: Pan fullname: Pan, Yinghui email: panyinghui@szu.edu.cn organization: National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China – sequence: 5 givenname: Huifan orcidid: 0000-0002-8624-1301 surname: Gao fullname: Gao, Huifan organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China – sequence: 6 givenname: Yuting orcidid: 0009-0002-9675-5987 surname: Zhang fullname: Zhang, Yuting organization: Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Department of Automation, Xiamen University, Xiamen, China |
| BookMark | eNpNkE1PAjEQhhuDiYjcPXjYP7DrdLof7ZGgKJHExMB5021nSQ3bku5Go79eEA6cZjLv-8zhuWUjHzwxds8h4xzU43q2zBBQZEKglLm8YmMsFU_zQvLRxX7Dpn3_CQBYcESsxmyx7PYxfJFNnqgLvh-iHlzw6ZsP3zuyW0o2g9u53_9r4nzyQc63IRrqyA_JinT0zm_v2HWrdz1Nz3PCNovn9fw1Xb2_LOezVWq4wCGtSiqwQWmlEVYhabCyUaUCrW2luZCH3CjRGqmwUk3JlQClOWBemLy1XEwYnP6aGPo-Ulvvo-t0_Kk51EcV9UFFfVRRn1UckIcT4ojooi5A5pCLPxx0XG0 |
| CODEN | ITAICB |
| Cites_doi | 10.24963/ijcai.2017/422 10.1016/j.neucom.2020.02.008 10.1002/9781118771075 10.1109/TNNLS.2021.3082568 10.1109/LRA.2018.2801479 10.1214/aoms/1177729694 10.1016/j.artint.2014.07.003 10.1609/aaai.v32i1.11757 10.1109/IJCNN.2017.7965896 10.1007/s10462-021-10061-9 10.1016/j.eswa.2018.09.036 10.1145/3409501.3409517 10.1038/s41586-020-2939-8 10.26599/TST.2021.9010012 10.1007/978-3-030-01234-2_36 10.1109/IROS51168.2021.9636020 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TAI.2023.3328848 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2691-4581 |
| EndPage | 2150 |
| ExternalDocumentID | 10_1109_TAI_2023_3328848 10308404 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Natural Science Foundation of Fujian Province, China grantid: 2022J05176 – fundername: National Natural Science Foundation of China grantid: 62176225; 62276168; 61836005 funderid: 10.13039/501100001809 – fundername: Guangdong Province, China grantid: 2023A1515010869 |
| GroupedDBID | 0R~ 97E AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS IEDLZ IFIPE JAVBF M~E OCL RIA RIE AAYXX CITATION |
| ID | FETCH-LOGICAL-c132t-76e52b28d8c3d92ea0d8b9690aad7a1386e5c93fc89279b619309a10245c4fd13 |
| IEDL.DBID | RIE |
| ISSN | 2691-4581 |
| IngestDate | Wed Oct 01 05:36:57 EDT 2025 Wed Aug 27 07:40:20 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c132t-76e52b28d8c3d92ea0d8b9690aad7a1386e5c93fc89279b619309a10245c4fd13 |
| ORCID | 0000-0001-5715-2855 0000-0002-8624-1301 0000-0003-1515-6449 0009-0002-8924-592X 0000-0002-5246-403X 0009-0002-9675-5987 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10308404 crossref_primary_10_1109_TAI_2023_3328848 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2024-May 2024-5-00 |
| PublicationDateYYYYMMDD | 2024-05-01 |
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-May |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE transactions on artificial intelligence |
| PublicationTitleAbbrev | TAI |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref13 ref12 Mnih (ref14) 2013 ref15 ref37 Zhang (ref6) 2021; 2021 ref31 ref30 ref11 Ross (ref34) 2011 ref2 ref1 Boyan (ref36) 2000; 13 ref17 Garg (ref16) 2021; 34 ref18 GrzeŚ (ref22) 2017 Schaal (ref4) 1996 Eysenbach (ref21) 2018 ref24 ref23 ref25 Vecerik (ref26) 2017 Badia (ref35) 2020 Brockman (ref33) 2016 ref28 Bellemare (ref9) 2017 ref27 Che (ref29) 2022 Kantorovich (ref32) 1939; 6 Pertsch (ref5) 2022 Schwarzer (ref19) 2021; 34 Dearden (ref8) 1998; 1998 Wang (ref3) 2021 Barth-Maron (ref10) 2018 Liu (ref20) 2021; 34 Rengarajan (ref7) 2022 |
| References_xml | – ident: ref28 doi: 10.24963/ijcai.2017/422 – year: 2018 ident: ref10 article-title: Distributed distributional deterministic policy gradients – start-page: 565 volume-title: Proc. 16th Conf. Auton. Agents MultiAgent Syst. year: 2017 ident: ref22 article-title: Reward shaping in episodic reinforcement learning – volume: 13 start-page: 982 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2000 ident: ref36 article-title: Exact solutions to time-dependent MDPS – volume: 34 start-page: 12686 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref19 article-title: Pretraining representations for data-efficient reinforcement learning – ident: ref27 doi: 10.1016/j.neucom.2020.02.008 – start-page: 729 volume-title: Proc. Conf. Robot Learn. year: 2022 ident: ref5 article-title: Guided reinforcement learning with learned skills – ident: ref30 doi: 10.1002/9781118771075 – volume: 34 start-page: 18459 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref20 article-title: Behavior from the void: Unsupervised active pre-training – volume: 6 start-page: 363 issue: 4 year: 1939 ident: ref32 article-title: The mathematical method of production planning and organization publication-title: Manage. Sci. – year: 2020 ident: ref35 article-title: Never Give Up: Learning directed exploration strategies – year: 2016 ident: ref33 article-title: OpenAI gym – ident: ref11 doi: 10.1109/TNNLS.2021.3082568 – year: 2017 ident: ref26 article-title: Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards – volume: 2021 start-page: 7588221 volume-title: Comput. Intell. Neuroscience year: 2021 ident: ref6 article-title: Efficient reinforcement learning from demonstration via Bayesian network-based knowledge extraction – volume: 34 start-page: 4028 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref16 article-title: LQ-learn: Inverse soft-Q learning for imitation – ident: ref37 doi: 10.1109/LRA.2018.2801479 – ident: ref31 doi: 10.1214/aoms/1177729694 – start-page: 627 volume-title: Proc. 14th Int. Conf. Artif. Intell. Statist. year: 2011 ident: ref34 article-title: A reduction of imitation learning and structured prediction to no-regret online learning – ident: ref17 doi: 10.1016/j.artint.2014.07.003 – ident: ref25 doi: 10.1609/aaai.v32i1.11757 – ident: ref23 doi: 10.1109/IJCNN.2017.7965896 – start-page: 1040 volume-title: Proc. 9th Int. Conf. Neural Inf. Process. Syst. year: 1996 ident: ref4 article-title: Learning from demonstration – ident: ref1 doi: 10.1007/s10462-021-10061-9 – start-page: 449 volume-title: Proc. Int. Conf. Mach. Learn. year: 2017 ident: ref9 article-title: A distributional perspective on reinforcement learning – year: 2022 ident: ref29 article-title: Bayesian Q-learning with imperfect expert demonstrations – ident: ref15 doi: 10.1016/j.eswa.2018.09.036 – ident: ref12 doi: 10.1145/3409501.3409517 – ident: ref13 doi: 10.1038/s41586-020-2939-8 – start-page: 10905 volume-title: Proc. Int. Conf. Mach. Learn. year: 2021 ident: ref3 article-title: SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II – ident: ref2 doi: 10.26599/TST.2021.9010012 – volume: 1998 start-page: 761 volume-title: Proc. AAAI/IAAI year: 1998 ident: ref8 article-title: Bayesian Q-learning – ident: ref18 doi: 10.1007/978-3-030-01234-2_36 – ident: ref24 doi: 10.1109/IROS51168.2021.9636020 – year: 2022 ident: ref7 article-title: Reinforcement learning with sparse rewards using guidance from offline demonstration – year: 2013 ident: ref14 article-title: Playing atari with deep reinforcement learning – year: 2018 ident: ref21 article-title: Diversity is all you need: Learning skills without a reward function |
| SSID | ssj0002512227 |
| Score | 2.2571924 |
| Snippet | Reinforcement learning (RL) has made great success in recent years. Generally, the learning process requires a huge amount of interaction with the environment... |
| SourceID | crossref ieee |
| SourceType | Index Database Publisher |
| StartPage | 2139 |
| SubjectTerms | Bayes methods Gaussian distribution Heuristic algorithms Learning from demonstration Merging Q-learning reinforcement learning Task analysis Vehicle dynamics |
| Title | Improved Demonstration-Knowledge Utilization in Reinforcement Learning |
| URI | https://ieeexplore.ieee.org/document/10308404 |
| Volume | 5 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2691-4581 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002512227 issn: 2691-4581 databaseCode: RIE dateStart: 20200101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2691-4581 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002512227 issn: 2691-4581 databaseCode: M~E dateStart: 20200101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA9uJy_Ox8T5GDl48ZCubdImOQ7dmMp2kA12K2keMtRNpLt48G83j1aGIHgrbVrC90v53t8PgGvKU4mtZYFYJgkiCU2QyLBBzBDJtBBZJlyj8HSWTxbkYZkt62Z13wujtfbFZzpylz6XrzZy60JlA0eJZR0S0gItyvLQrPUTUHGKOk1pk4qM-WA-vI8cO3iEccqYY_jZUT07XCpelYw7YNZsIlSQvETbqozk56_5jP_e5SE4qI1KOAyn4Ajs6fUx6DSEDbD-f0_AOIQQtIJ3-s0ZhgF-9NgE1uCiWr3WnZlwtYZP2g9WlT6GCOtZrM9dsBiP5rcTVBMpIGmdzQrRXGdpmTLFJFY81SJWrOTWLxZCUZFgZp9Ljo1kPKW8tD4VjrlIXFZWEqMSfAra681anwGoShOTROf2OznJMyO4MRZPi6y1G0pBe-CmkXHxHuZlFN7PiHlh8SgcHkWNRw90nfR21gXBnf9x_wLs29dJKDe8BO3qY6uvrElQlX3Qmn6N-v5AfAPIbrXt |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA86D3pxPnE-c_DiIV2bR5schzo29zjIBruVNEllqJ1Id_GvN2laGYLgLTQhhO_7yvf-fgDcJgIrYi0LxJmiiEZJhCQjOeI5VdxIyZh0jcKTaTyY06cFW9TN6lUvjDGmKj4zgVtWuXy9UmsXKus6SCzrkNBtsMMopcy3a_2EVJyqxjhpkpGh6M56w8DhgweEYM4dxs-G8tlAU6mUSb8Nps0zfA3Ja7Aus0B9_ZrQ-O93HoD92qyEPS8Hh2DLFEeg3UA2wPoPPgZ9H0QwGj6Yd2caegFAoya0Bufl8q3uzYTLAj6barSqqqKIsJ7G-nIC5v3H2f0A1VAKSFl3s0RJbBjOMNdcES2wkaHmmbCesZQ6kRHhdl8JkisucCIy61WRUMjI5WUVzXVETkGrWBXmDECd5SGNTGzviWnMciny3HLU8tZaDplMOuCuoXH64SdmpJWnEYrU8iN1_EhrfnTAiaPexjlPuPM_vt-A3cFsMk7Hw-noAuzZq6gvPrwErfJzba6sgVBm15VYfAMtBrgJ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improved+Demonstration-Knowledge+Utilization+in+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+artificial+intelligence&rft.au=Liu%2C+Yanyu&rft.au=Zeng%2C+Yifeng&rft.au=Ma%2C+Biyang&rft.au=Pan%2C+Yinghui&rft.date=2024-05-01&rft.pub=IEEE&rft.eissn=2691-4581&rft.volume=5&rft.issue=5&rft.spage=2139&rft.epage=2150&rft_id=info:doi/10.1109%2FTAI.2023.3328848&rft.externalDocID=10308404 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2691-4581&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2691-4581&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2691-4581&client=summon |