带有惩罚措施的自竞争事后经验重播算法

TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功...

Full description

Saved in:
Bibliographic Details
Published in计算机科学与探索 Vol. 18; no. 5; pp. 1223 - 1231
Main Authors 王子豪, 钱雪忠, 宋威
Format Journal Article
LanguageChinese
Published 江南大学 人工智能与计算机学院,江苏 无锡 214122 01.05.2024
Subjects
Online AccessGet full text
ISSN1673-9418
DOI10.3778/j.issn.1673-9418.2303031

Cover

Abstract TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功的关键步骤,提高生成虚拟目标的准确程度.实验结果表明,SCHER算法可以更好地利用经验回放技术,将平均任务成功率提高5.7个百分点,拥有更高的准确率和泛化能力.
AbstractList TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功的关键步骤,提高生成虚拟目标的准确程度.实验结果表明,SCHER算法可以更好地利用经验回放技术,将平均任务成功率提高5.7个百分点,拥有更高的准确率和泛化能力.
Abstract_FL Self-competitive hindsight experience replay(SCHER)is an improved strategy proposed based on the hindsight experience replay(HER)algorithm.The HER algorithm generates virtual labeled data by replaying experi-ences to optimize the model in the face of sparse environmental rewards.However,the HER algorithm has two prob-lems:firstly,it cannot handle the large amount of repetitive data generated due to sparse rewards,which contami-nates the experience pool;secondly,virtual goals may randomly select intermediate states that are not helpful in completing the task,leading to learning bias.To address these issues,the SCHER algorithm proposes two improve-ment strategies:firstly,increase the adaptive reward signal to penalize meaningless actions made by agents and quickly avoid such operations;secondly,use self-competition strategy to generate two sets of data for the same task,analyze and compare them,and find the key steps that enable the agent to succeed in different environments,thereby improving the accuracy of generated virtual goals.Experimental results show that the SCHER algorithm can better utilize the experience replay technique,increasing the average task success rate by 5.7 percentage points,and has higher accuracy and generalization ability.
Author 王子豪
宋威
钱雪忠
AuthorAffiliation 江南大学 人工智能与计算机学院,江苏 无锡 214122
AuthorAffiliation_xml – name: 江南大学 人工智能与计算机学院,江苏 无锡 214122
Author_FL SONG Wei
QIAN Xuezhong
WANG Zihao
Author_FL_xml – sequence: 1
  fullname: WANG Zihao
– sequence: 2
  fullname: QIAN Xuezhong
– sequence: 3
  fullname: SONG Wei
Author_xml – sequence: 1
  fullname: 王子豪
– sequence: 2
  fullname: 钱雪忠
– sequence: 3
  fullname: 宋威
BookMark eNo9jbtKA0EYRqeIYIx5B1uLXf-57MxsKcEbBGy0DpPZWckqE3AUtVYIQU1EUCGIpeCKNjYSydsks-YtDCjyFQdOcb4FVLJtaxBawhBSIeRKFracsyHmggYxwzIkFGbDJVT-d_Oo6lyrCRFjBAsuy4hPPp_9Y9efvxSjge_l_n5UDC6-O3nx-jQedsfDy8lNr_jqT_Oraefa374V7w_-424RzaXqwJnqHytod31tp7YZ1Lc3tmqr9cBhYDJg2mAqFVVaJzQhmBIlpSSU8LSJMU9YqmNQCcRGUAOSCNDaKKVlhOMUgNAKWv7tniibKrvXyNrHh3b22Mhctn96duQIEAYRgKQ_CRNfXw
ClassificationCodes TP181
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3778/j.issn.1673-9418.2303031
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
DocumentTitle_FL Self-competitive Hindsight Experience Replay with Penalty Measures
EndPage 1231
ExternalDocumentID jsjkxyts202405008
GroupedDBID 2B.
4A8
92I
93N
ALMA_UNASSIGNED_HOLDINGS
M~E
PSX
TCJ
ID FETCH-LOGICAL-s1048-4ce138a3accd3d2132a8882326fb116d4fc90ad09e73e08270cceaac8519f0023
ISSN 1673-9418
IngestDate Thu May 29 04:00:18 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords 稀疏奖励
sparse reward
深度强化学习
自适应奖励信号
deep reinforcement learning
adaptive reward signal
经验回放
experience replay
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1048-4ce138a3accd3d2132a8882326fb116d4fc90ad09e73e08270cceaac8519f0023
PageCount 9
ParticipantIDs wanfang_journals_jsjkxyts202405008
PublicationCentury 2000
PublicationDate 2024-05-01
PublicationDateYYYYMMDD 2024-05-01
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-01
  day: 01
PublicationDecade 2020
PublicationTitle 计算机科学与探索
PublicationTitle_FL Journal of Frontiers of Computer Science & Technology
PublicationYear 2024
Publisher 江南大学 人工智能与计算机学院,江苏 无锡 214122
Publisher_xml – name: 江南大学 人工智能与计算机学院,江苏 无锡 214122
SSID ssib054421768
ssib002040941
ssib002423894
ssib051375751
ssib023646573
ssib036438069
ssib002040926
Score 2.382516
Snippet TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型....
SourceID wanfang
SourceType Aggregation Database
StartPage 1223
Title 带有惩罚措施的自竞争事后经验重播算法
URI https://d.wanfangdata.com.cn/periodical/jsjkxyts202405008
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  issn: 1673-9418
  databaseCode: ADMLS
  dateStart: 20200501
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  omitProxy: false
  ssIdentifier: ssib002423894
  providerName: EBSCOhost
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  issn: 1673-9418
  databaseCode: M~E
  dateStart: 20070101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://road.issn.org
  omitProxy: true
  ssIdentifier: ssib054421768
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pa9RAFA61XryIouJvijinsjWTmSQzx2SbpQj11EJvJZtNlAordLegPXhRKEWtUlChiEfBil68SKX_Tbtr_wu_N8kmKa1SZSFM3sx773vzNvPeJDOMZd2mzYmax25DZZ20IUUq8MxhlmIncZxkbtb2zJmRs_e8mXl5d8FdGDu1WVu1tNJvTyWrx-4r-R-vgga_0i7Zf_BsKRQElOFfXOFhXE_kYxa5LFQs8FjkMd1kSlNB0advFvksnGY6MJSIBaagPSKiCnQlWaSY8k0VriHTEYskCwMjJy-EpELbJIEEhky1WKSJRTWpAHY1bSQ7LDCSg4hpnyihYPnBlqPcl9ShNuCHmgF2mAMAKk7qIIcskmQa6c3xOwaARGH0HyECanKMYAJMaAg50FVNNCEDjQqhMRVd1mKBXTVxCUwhBRp5_UWII6tlh-avayzjTLeoOUwnI8AnCf4I-2TRe2GuDaa5xlJtKIocRF4oDfxLt5S9oYkdneA0awgUwVatScMAZtu0AxY-6XDJndorXe75oqFlEYWOxKTqq78JMGAVtWQFeQc_LhAK31cmEJKKqVIFLfvHj1fBv1ySudRbevj4Sb9HPWu7Zv_8aQeRko5DmX0aVSkdRn1dn5LSvTy0Nxo5cDnG0_kEnlulyLgVyvbKFNrlwqdPf-W9lJgk5ztYR6jzBXhk0p0_GWS23HWzuHu_lh3OnbPOFtO6iSB_Rs9bY6sPLlje_o9Pgw_rg2efh7tbg43twbvd4dbzX2vbwy8f93bW93Ze7L_ZGP58fbD98mDt1WDz6_Db-8H3txet-VY015xpFOeUNHrcpvfxScqFikWcJB3RcbhwYoWJKyZGWZtzryOzRNtxx9apL1Kk3L6dJCmGQkx2dEZJ8yVrvPuom162JhTsl55UPI7pXIS47fBOGyIhBH3j8ivWrcLOxWIc6i0e8dzVkzS6Zp2pHqLr1nh_eSW9gfy6375pHP4bVcuXPQ
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%B8%A6%E6%9C%89%E6%83%A9%E7%BD%9A%E6%8E%AA%E6%96%BD%E7%9A%84%E8%87%AA%E7%AB%9E%E4%BA%89%E4%BA%8B%E5%90%8E%E7%BB%8F%E9%AA%8C%E9%87%8D%E6%92%AD%E7%AE%97%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8E%A2%E7%B4%A2&rft.au=%E7%8E%8B%E5%AD%90%E8%B1%AA&rft.au=%E9%92%B1%E9%9B%AA%E5%BF%A0&rft.au=%E5%AE%8B%E5%A8%81&rft.date=2024-05-01&rft.pub=%E6%B1%9F%E5%8D%97%E5%A4%A7%E5%AD%A6+%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E4%B8%8E%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%AD%A6%E9%99%A2%2C%E6%B1%9F%E8%8B%8F+%E6%97%A0%E9%94%A1+214122&rft.issn=1673-9418&rft.volume=18&rft.issue=5&rft.spage=1223&rft.epage=1231&rft_id=info:doi/10.3778%2Fj.issn.1673-9418.2303031&rft.externalDocID=jsjkxyts202405008
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjkxyts%2Fjsjkxyts.jpg