融合动作退出和软奖励的强化学习知识推理方法
TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL).AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分...
Saved in:
| Published in | 计算机工程与应用 Vol. 60; no. 24; pp. 158 - 165 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | Chinese |
| Published |
北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021
15.12.2024
北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021 |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1002-8331 |
| DOI | 10.3778/j.issn.1002-8331.2308-0215 |
Cover
| Abstract | TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL).AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分函数计算智能体所选三元组的相似度得分,并将所得分数作为智能体的奖励,有效解决稀疏奖励问题.为验证该方法的有效性,在FB15K-237和NELL-995数据集上进行实验,将实验结果与TransE、MINERVA、HRL等9种主流方法进行对比分析,结果表明该方法在链接预测任务上的Hits@k平均提升了0.027,MRR平均提升了0.056. |
|---|---|
| AbstractList | TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL).AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分函数计算智能体所选三元组的相似度得分,并将所得分数作为智能体的奖励,有效解决稀疏奖励问题.为验证该方法的有效性,在FB15K-237和NELL-995数据集上进行实验,将实验结果与TransE、MINERVA、HRL等9种主流方法进行对比分析,结果表明该方法在链接预测任务上的Hits@k平均提升了0.027,MRR平均提升了0.056. |
| Abstract_FL | Aiming at the problems of overfitting and sparse reward in deep reinforcement learning reasoning methods,a knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward is proposed(AS-KRL).AS-KRL uses gated recurrent unit(GRU)to encode the historical path information and provide the global information of the current node for the agent's action selection.By introducing the action exit strategy to hide some neurons randomly,the strategy network is constructed to improve the success rate of model path search and avoid the possible overfitting problem.The strategy network is used to guide the agent to make action selection,and the score function is called to calcu-late the similarity score of the triplet selected by the agent,and the score is taken as the reward of the agent,which effec-tively solves the sparse reward problem.To verify the effectiveness of the proposed method,experiments are carried out on FB15K-237 and NELL-995 datasets.The experimental results are compared with those of 9 mainstream methods such as TransE,MINERVA and HRL.The results show that the proposed method improves Hits@k by an average of 0.027 and MRR by an average of 0.056 on the link prediction task. |
| Author | 王海荣 马赫 孙崇 荆博祥 |
| AuthorAffiliation | 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021;北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021 |
| AuthorAffiliation_xml | – name: 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021;北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021 |
| Author_FL | SUN Chong MA He WANG Hairong JING Boxiang |
| Author_FL_xml | – sequence: 1 fullname: SUN Chong – sequence: 2 fullname: WANG Hairong – sequence: 3 fullname: JING Boxiang – sequence: 4 fullname: MA He |
| Author_xml | – sequence: 1 fullname: 孙崇 – sequence: 2 fullname: 王海荣 – sequence: 3 fullname: 荆博祥 – sequence: 4 fullname: 马赫 |
| BookMark | eNo9jT9Lw0AcQG-oYK39Em4OiXf3u9yfSaT4DwouOpck5kqDpOAh0q2DVClVSsEMIupS6SJFnFrst-kl8VsoKE4P3vDeGiol7SRCaINgF4SQW7HbMiZxCcbUkQDEpYClgynxSqj8b1dR1ZhWgD0CwhOgymi7eLq1wxvbnywXj1_drr2e29GgWEztOLX99_zhyn7O7SC1b6_L2Uv-PC6mvexukg97WTrLPu7X0Yr2z0xU_WMFneztHtcOnPrR_mFtp-4Ygjk4OtCKBUJL7HEQjASCcQZShpyz0ANO_UiHgcD8NMJCMWCEKsEVExoU45JDBW3-di_9RPtJsxG3L86Tn2MjNnEz7HQ6FFNGGSYA35WIZHo |
| ClassificationCodes | TP391 |
| ContentType | Journal Article |
| Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| DBID | 2B. 4A8 92I 93N PSX TCJ |
| DOI | 10.3778/j.issn.1002-8331.2308-0215 |
| DatabaseName | Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| DocumentTitle_FL | Knowledge Reasoning Method of Reinforcement Learning Integrating Action Withdrawal and Soft Reward |
| EndPage | 165 |
| ExternalDocumentID | jsjgcyyy202424013 |
| GrantInformation_xml | – fundername: 宁夏自然科学基金项目 funderid: (2023AAC03316) |
| GroupedDBID | -0Y 2B. 4A8 5XA 5XJ 92H 92I 93N ABJNI ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CUBFJ CW9 PSX TCJ TGT U1G U5S |
| ID | FETCH-LOGICAL-s1063-fbf94b7f80563741b7464388c664c5362aefcb706de07943412976947f3946863 |
| ISSN | 1002-8331 |
| IngestDate | Thu May 29 04:10:55 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Issue | 24 |
| Keywords | gated recurrent unit(GRU) 知识推理 动作退出算法 强化学习 软奖励机制 soft reward mechanism knowledge reasoning action dropout 门控循环神经网络 reinforcement learning |
| Language | Chinese |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-s1063-fbf94b7f80563741b7464388c664c5362aefcb706de07943412976947f3946863 |
| PageCount | 8 |
| ParticipantIDs | wanfang_journals_jsjgcyyy202424013 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-12-15 |
| PublicationDateYYYYMMDD | 2024-12-15 |
| PublicationDate_xml | – month: 12 year: 2024 text: 2024-12-15 day: 15 |
| PublicationDecade | 2020 |
| PublicationTitle | 计算机工程与应用 |
| PublicationTitle_FL | Computer Engineering and Applications |
| PublicationYear | 2024 |
| Publisher | 北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021 |
| Publisher_xml | – name: 北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021 – name: 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021 |
| SSID | ssib051375739 ssib001102935 ssj0000561668 ssib023646291 ssib057620132 |
| Score | 2.0008616 |
| Snippet | TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of... |
| SourceID | wanfang |
| SourceType | Aggregation Database |
| StartPage | 158 |
| Title | 融合动作退出和软奖励的强化学习知识推理方法 |
| URI | https://d.wanfangdata.com.cn/periodical/jsjgcyyy202424013 |
| Volume | 60 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Inspec with Full Text issn: 1002-8331 databaseCode: ADMLS dateStart: 20200501 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text omitProxy: false ssIdentifier: ssib057620132 providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1Na9RANNStBz2In_hNEedUtm6SyXycZLKbUkS92EJvZbNNKj2sYNtDe-pBqpQqRbAHEfVS6UWKeGqx_6Zp67_wvTezm_gFVQhhePO-5r1J5s1u3hvPuy1SPZ2HSsDeROLfjKmop7qd17N2CvFIHsh2StU-H4qxCX5vMpocqJ2sfLW0MJ-OdJb-mFfyP14FGPgVs2T_wbN9pgCANvgX7uBhuB_LxyxRTCdMtVgSMd1gSmFDGWagwVncYrrJEo1fM8CFXZLFhpADpppIDjhmFCEGgMKRxz5LJNOGKY6QuOmogMTiGKASJEIz0yDkUeQADIGbgi7BVEJqSFKMIEAL-NCIQ2bPvOyFxUQI-CQXGloSfk9uLIm5RIYqJrkK-WOXYZqTFA69venjdNSaUCBclmWPRErkAnoQa5ANFjRhiUIAZa3RQjskpICJShQYN1zWhNATV389CahGo80fpfnes5382QoNpn2CgErWBzAG2bfvMDkqZjENVIUOqQ9B1wVEr8lOCnHilmOkeMX2lsp2JSwWFfERikQTapwdOJXgHpCKhAzzCwdraL4QBJgHNK10SKzIiLo1DFFhw-ahu1WOlsFeqpxbBu2xDu5xD3hlUfNtcX0XH_n2bI9fl95QSkVLL0oY6UvARANVx8CyDDj6n4HOzs3OdBYXF9ExAW70T3iDAazOjZo3aFoP7j8qA3uIg3UZ2OOpByIoqzxFfigjWda3jZCJq0nqqvwLX7i8WKeZrUCMat_5u9KUytfN292ZStQ5ftY747aLQ8Y---e8gaXH573TlSKiF7y7R-9fFusvitWt_b1335eXi-e7xeu1o73tYnOjWP1y-PZZ8W23WNsoPn_a3_l4-GHzaHvl4NXW4frKwcbOwdc3F72J0WS8OVZ3x6LU53zYUNTzNNc8lbmCQYWwIUglh22FUh0heCeCgLSd5Z1UNsR01qDyjxDSS6G5zEPNhRLhJa_WfdLNLntDvK1kLrIwCqOc8ww4Z1GeRjrHqolCR1e8W274U-61Nzf1m9OuHgfpmneqfPyue7X5pwvZDQjn59Obztc_ADRysuU |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E8%9E%8D%E5%90%88%E5%8A%A8%E4%BD%9C%E9%80%80%E5%87%BA%E5%92%8C%E8%BD%AF%E5%A5%96%E5%8A%B1%E7%9A%84%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0%E7%9F%A5%E8%AF%86%E6%8E%A8%E7%90%86%E6%96%B9%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%B7%A5%E7%A8%8B%E4%B8%8E%E5%BA%94%E7%94%A8&rft.au=%E5%AD%99%E5%B4%87&rft.au=%E7%8E%8B%E6%B5%B7%E8%8D%A3&rft.au=%E8%8D%86%E5%8D%9A%E7%A5%A5&rft.au=%E9%A9%AC%E8%B5%AB&rft.date=2024-12-15&rft.pub=%E5%8C%97%E6%96%B9%E6%B0%91%E6%97%8F%E5%A4%A7%E5%AD%A6+%E5%9B%BE%E5%83%8F%E5%9B%BE%E5%BD%A2%E6%99%BA%E8%83%BD%E5%A4%84%E7%90%86%E5%9B%BD%E5%AE%B6%E6%B0%91%E5%A7%94%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E9%93%B6%E5%B7%9D+750021&rft.issn=1002-8331&rft.volume=60&rft.issue=24&rft.spage=158&rft.epage=165&rft_id=info:doi/10.3778%2Fj.issn.1002-8331.2308-0215&rft.externalDocID=jsjgcyyy202424013 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjgcyyy%2Fjsjgcyyy.jpg |