带有惩罚措施的自竞争事后经验重播算法
TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功...
        Saved in:
      
    
          | Published in | 计算机科学与探索 Vol. 18; no. 5; pp. 1223 - 1231 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | Chinese | 
| Published | 
            江南大学 人工智能与计算机学院,江苏 无锡 214122
    
        01.05.2024
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1673-9418 | 
| DOI | 10.3778/j.issn.1673-9418.2303031 | 
Cover
| Abstract | TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功的关键步骤,提高生成虚拟目标的准确程度.实验结果表明,SCHER算法可以更好地利用经验回放技术,将平均任务成功率提高5.7个百分点,拥有更高的准确率和泛化能力. | 
    
|---|---|
| AbstractList | TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.但HER算法存在两个问题:一是无法处理智能体由于奖励稀疏所产生的大量重复数据,这些无效数据会对经验池造成污染;二是虚拟目标可能会随机选择到一些对完成任务没有帮助的中间状态,导致学习偏差.针对这些问题,SCHER算法提出了两个改进策略:一是增加自适应的奖励信号,对智能体做出的无意义动作进行惩罚,使其快速规避此类操作;二是使用自竞争策略,通过竞争产生针对同一任务下的两组不同数据,对比分析后找到使智能体在不同环境中成功的关键步骤,提高生成虚拟目标的准确程度.实验结果表明,SCHER算法可以更好地利用经验回放技术,将平均任务成功率提高5.7个百分点,拥有更高的准确率和泛化能力. | 
    
| Abstract_FL | Self-competitive hindsight experience replay(SCHER)is an improved strategy proposed based on the hindsight experience replay(HER)algorithm.The HER algorithm generates virtual labeled data by replaying experi-ences to optimize the model in the face of sparse environmental rewards.However,the HER algorithm has two prob-lems:firstly,it cannot handle the large amount of repetitive data generated due to sparse rewards,which contami-nates the experience pool;secondly,virtual goals may randomly select intermediate states that are not helpful in completing the task,leading to learning bias.To address these issues,the SCHER algorithm proposes two improve-ment strategies:firstly,increase the adaptive reward signal to penalize meaningless actions made by agents and quickly avoid such operations;secondly,use self-competition strategy to generate two sets of data for the same task,analyze and compare them,and find the key steps that enable the agent to succeed in different environments,thereby improving the accuracy of generated virtual goals.Experimental results show that the SCHER algorithm can better utilize the experience replay technique,increasing the average task success rate by 5.7 percentage points,and has higher accuracy and generalization ability. | 
    
| Author | 王子豪 宋威 钱雪忠  | 
    
| AuthorAffiliation | 江南大学 人工智能与计算机学院,江苏 无锡 214122 | 
    
| AuthorAffiliation_xml | – name: 江南大学 人工智能与计算机学院,江苏 无锡 214122 | 
    
| Author_FL | SONG Wei QIAN Xuezhong WANG Zihao  | 
    
| Author_FL_xml | – sequence: 1 fullname: WANG Zihao – sequence: 2 fullname: QIAN Xuezhong – sequence: 3 fullname: SONG Wei  | 
    
| Author_xml | – sequence: 1 fullname: 王子豪 – sequence: 2 fullname: 钱雪忠 – sequence: 3 fullname: 宋威  | 
    
| BookMark | eNo9jbtKA0EYRqeIYIx5B1uLXf-57MxsKcEbBGy0DpPZWckqE3AUtVYIQU1EUCGIpeCKNjYSydsks-YtDCjyFQdOcb4FVLJtaxBawhBSIeRKFracsyHmggYxwzIkFGbDJVT-d_Oo6lyrCRFjBAsuy4hPPp_9Y9efvxSjge_l_n5UDC6-O3nx-jQedsfDy8lNr_jqT_Oraefa374V7w_-424RzaXqwJnqHytod31tp7YZ1Lc3tmqr9cBhYDJg2mAqFVVaJzQhmBIlpSSU8LSJMU9YqmNQCcRGUAOSCNDaKKVlhOMUgNAKWv7tniibKrvXyNrHh3b22Mhctn96duQIEAYRgKQ_CRNfXw | 
    
| ClassificationCodes | TP181 | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. | 
    
| Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. | 
    
| DBID | 2B. 4A8 92I 93N PSX TCJ  | 
    
| DOI | 10.3778/j.issn.1673-9418.2303031 | 
    
| DatabaseName | Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ)  | 
    
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc | 
    
| DocumentTitle_FL | Self-competitive Hindsight Experience Replay with Penalty Measures | 
    
| EndPage | 1231 | 
    
| ExternalDocumentID | jsjkxyts202405008 | 
    
| GroupedDBID | 2B. 4A8 92I 93N ALMA_UNASSIGNED_HOLDINGS M~E PSX TCJ  | 
    
| ID | FETCH-LOGICAL-s1048-4ce138a3accd3d2132a8882326fb116d4fc90ad09e73e08270cceaac8519f0023 | 
    
| ISSN | 1673-9418 | 
    
| IngestDate | Thu May 29 04:00:18 EDT 2025 | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 5 | 
    
| Keywords | 稀疏奖励 sparse reward 深度强化学习 自适应奖励信号 deep reinforcement learning adaptive reward signal 经验回放 experience replay  | 
    
| Language | Chinese | 
    
| LinkModel | OpenURL | 
    
| MergedId | FETCHMERGED-LOGICAL-s1048-4ce138a3accd3d2132a8882326fb116d4fc90ad09e73e08270cceaac8519f0023 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | wanfang_journals_jsjkxyts202405008 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2024-05-01 | 
    
| PublicationDateYYYYMMDD | 2024-05-01 | 
    
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-05-01 day: 01  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | 计算机科学与探索 | 
    
| PublicationTitle_FL | Journal of Frontiers of Computer Science & Technology | 
    
| PublicationYear | 2024 | 
    
| Publisher | 江南大学 人工智能与计算机学院,江苏 无锡 214122 | 
    
| Publisher_xml | – name: 江南大学 人工智能与计算机学院,江苏 无锡 214122 | 
    
| SSID | ssib054421768 ssib002040941 ssib002423894 ssib051375751 ssib023646573 ssib036438069 ssib002040926  | 
    
| Score | 2.382516 | 
    
| Snippet | TP181; 自竞争事后经验重播(SCHER)是在事后经验重播(HER)算法的基础上提出的一种改进策略.HER算法在面对环境奖励稀疏的情况下,通过回放经验生成虚拟有标签数据来优化模型.... | 
    
| SourceID | wanfang | 
    
| SourceType | Aggregation Database | 
    
| StartPage | 1223 | 
    
| Title | 带有惩罚措施的自竞争事后经验重播算法 | 
    
| URI | https://d.wanfangdata.com.cn/periodical/jsjkxyts202405008 | 
    
| Volume | 18 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Inspec with Full Text issn: 1673-9418 databaseCode: ADMLS dateStart: 20200501 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text omitProxy: false ssIdentifier: ssib002423894 providerName: EBSCOhost – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources issn: 1673-9418 databaseCode: M~E dateStart: 20070101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://road.issn.org omitProxy: true ssIdentifier: ssib054421768 providerName: ISSN International Centre  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pa9RAFA61XryIouJvijinsjWTmSQzx2SbpQj11EJvJZtNlAordLegPXhRKEWtUlChiEfBil68SKX_Tbtr_wu_N8kmKa1SZSFM3sx773vzNvPeJDOMZd2mzYmax25DZZ20IUUq8MxhlmIncZxkbtb2zJmRs_e8mXl5d8FdGDu1WVu1tNJvTyWrx-4r-R-vgga_0i7Zf_BsKRQElOFfXOFhXE_kYxa5LFQs8FjkMd1kSlNB0advFvksnGY6MJSIBaagPSKiCnQlWaSY8k0VriHTEYskCwMjJy-EpELbJIEEhky1WKSJRTWpAHY1bSQ7LDCSg4hpnyihYPnBlqPcl9ShNuCHmgF2mAMAKk7qIIcskmQa6c3xOwaARGH0HyECanKMYAJMaAg50FVNNCEDjQqhMRVd1mKBXTVxCUwhBRp5_UWII6tlh-avayzjTLeoOUwnI8AnCf4I-2TRe2GuDaa5xlJtKIocRF4oDfxLt5S9oYkdneA0awgUwVatScMAZtu0AxY-6XDJndorXe75oqFlEYWOxKTqq78JMGAVtWQFeQc_LhAK31cmEJKKqVIFLfvHj1fBv1ySudRbevj4Sb9HPWu7Zv_8aQeRko5DmX0aVSkdRn1dn5LSvTy0Nxo5cDnG0_kEnlulyLgVyvbKFNrlwqdPf-W9lJgk5ztYR6jzBXhk0p0_GWS23HWzuHu_lh3OnbPOFtO6iSB_Rs9bY6sPLlje_o9Pgw_rg2efh7tbg43twbvd4dbzX2vbwy8f93bW93Ze7L_ZGP58fbD98mDt1WDz6_Db-8H3txet-VY015xpFOeUNHrcpvfxScqFikWcJB3RcbhwYoWJKyZGWZtzryOzRNtxx9apL1Kk3L6dJCmGQkx2dEZJ8yVrvPuom162JhTsl55UPI7pXIS47fBOGyIhBH3j8ivWrcLOxWIc6i0e8dzVkzS6Zp2pHqLr1nh_eSW9gfy6375pHP4bVcuXPQ | 
    
| linkProvider | ISSN International Centre | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%B8%A6%E6%9C%89%E6%83%A9%E7%BD%9A%E6%8E%AA%E6%96%BD%E7%9A%84%E8%87%AA%E7%AB%9E%E4%BA%89%E4%BA%8B%E5%90%8E%E7%BB%8F%E9%AA%8C%E9%87%8D%E6%92%AD%E7%AE%97%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8E%A2%E7%B4%A2&rft.au=%E7%8E%8B%E5%AD%90%E8%B1%AA&rft.au=%E9%92%B1%E9%9B%AA%E5%BF%A0&rft.au=%E5%AE%8B%E5%A8%81&rft.date=2024-05-01&rft.pub=%E6%B1%9F%E5%8D%97%E5%A4%A7%E5%AD%A6+%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E4%B8%8E%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%AD%A6%E9%99%A2%2C%E6%B1%9F%E8%8B%8F+%E6%97%A0%E9%94%A1+214122&rft.issn=1673-9418&rft.volume=18&rft.issue=5&rft.spage=1223&rft.epage=1231&rft_id=info:doi/10.3778%2Fj.issn.1673-9418.2303031&rft.externalDocID=jsjkxyts202405008 | 
    
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjkxyts%2Fjsjkxyts.jpg |