融合动作退出和软奖励的强化学习知识推理方法

TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL).AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分...

Full description

Saved in:
Bibliographic Details
Published in计算机工程与应用 Vol. 60; no. 24; pp. 158 - 165
Main Authors 孙崇, 王海荣, 荆博祥, 马赫
Format Journal Article
LanguageChinese
Published 北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021 15.12.2024
北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021
Subjects
Online AccessGet full text
ISSN1002-8331
DOI10.3778/j.issn.1002-8331.2308-0215

Cover

Abstract TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL).AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分函数计算智能体所选三元组的相似度得分,并将所得分数作为智能体的奖励,有效解决稀疏奖励问题.为验证该方法的有效性,在FB15K-237和NELL-995数据集上进行实验,将实验结果与TransE、MINERVA、HRL等9种主流方法进行对比分析,结果表明该方法在链接预测任务上的Hits@k平均提升了0.027,MRR平均提升了0.056.
AbstractList TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward,AS-KRL).AS-KRL使用门控循环神经网络(GRU)对历史路径信息进行编码,为智能体的动作选择提供当前节点的全局信息;引入动作退出策略随机隐藏部分神经元后再构建策略网络,提高模型路径搜索的成功率,还避免了可能出现的过拟合问题;通过策略网络指导智能体进行动作选择,调用评分函数计算智能体所选三元组的相似度得分,并将所得分数作为智能体的奖励,有效解决稀疏奖励问题.为验证该方法的有效性,在FB15K-237和NELL-995数据集上进行实验,将实验结果与TransE、MINERVA、HRL等9种主流方法进行对比分析,结果表明该方法在链接预测任务上的Hits@k平均提升了0.027,MRR平均提升了0.056.
Abstract_FL Aiming at the problems of overfitting and sparse reward in deep reinforcement learning reasoning methods,a knowledge reasoning method of reinforcement learning integrating action withdrawal and soft reward is proposed(AS-KRL).AS-KRL uses gated recurrent unit(GRU)to encode the historical path information and provide the global information of the current node for the agent's action selection.By introducing the action exit strategy to hide some neurons randomly,the strategy network is constructed to improve the success rate of model path search and avoid the possible overfitting problem.The strategy network is used to guide the agent to make action selection,and the score function is called to calcu-late the similarity score of the triplet selected by the agent,and the score is taken as the reward of the agent,which effec-tively solves the sparse reward problem.To verify the effectiveness of the proposed method,experiments are carried out on FB15K-237 and NELL-995 datasets.The experimental results are compared with those of 9 mainstream methods such as TransE,MINERVA and HRL.The results show that the proposed method improves Hits@k by an average of 0.027 and MRR by an average of 0.056 on the link prediction task.
Author 王海荣
马赫
孙崇
荆博祥
AuthorAffiliation 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021;北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021
AuthorAffiliation_xml – name: 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021;北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021
Author_FL SUN Chong
MA He
WANG Hairong
JING Boxiang
Author_FL_xml – sequence: 1
  fullname: SUN Chong
– sequence: 2
  fullname: WANG Hairong
– sequence: 3
  fullname: JING Boxiang
– sequence: 4
  fullname: MA He
Author_xml – sequence: 1
  fullname: 孙崇
– sequence: 2
  fullname: 王海荣
– sequence: 3
  fullname: 荆博祥
– sequence: 4
  fullname: 马赫
BookMark eNo9jT9Lw0AcQG-oYK39Em4OiXf3u9yfSaT4DwouOpck5kqDpOAh0q2DVClVSsEMIupS6SJFnFrst-kl8VsoKE4P3vDeGiol7SRCaINgF4SQW7HbMiZxCcbUkQDEpYClgynxSqj8b1dR1ZhWgD0CwhOgymi7eLq1wxvbnywXj1_drr2e29GgWEztOLX99_zhyn7O7SC1b6_L2Uv-PC6mvexukg97WTrLPu7X0Yr2z0xU_WMFneztHtcOnPrR_mFtp-4Ygjk4OtCKBUJL7HEQjASCcQZShpyz0ANO_UiHgcD8NMJCMWCEKsEVExoU45JDBW3-di_9RPtJsxG3L86Tn2MjNnEz7HQ6FFNGGSYA35WIZHo
ClassificationCodes TP391
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3778/j.issn.1002-8331.2308-0215
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
DocumentTitle_FL Knowledge Reasoning Method of Reinforcement Learning Integrating Action Withdrawal and Soft Reward
EndPage 165
ExternalDocumentID jsjgcyyy202424013
GrantInformation_xml – fundername: 宁夏自然科学基金项目
  funderid: (2023AAC03316)
GroupedDBID -0Y
2B.
4A8
5XA
5XJ
92H
92I
93N
ABJNI
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CUBFJ
CW9
PSX
TCJ
TGT
U1G
U5S
ID FETCH-LOGICAL-s1063-fbf94b7f80563741b7464388c664c5362aefcb706de07943412976947f3946863
ISSN 1002-8331
IngestDate Thu May 29 04:10:55 EDT 2025
IsPeerReviewed false
IsScholarly false
Issue 24
Keywords gated recurrent unit(GRU)
知识推理
动作退出算法
强化学习
软奖励机制
soft reward mechanism
knowledge reasoning
action dropout
门控循环神经网络
reinforcement learning
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1063-fbf94b7f80563741b7464388c664c5362aefcb706de07943412976947f3946863
PageCount 8
ParticipantIDs wanfang_journals_jsjgcyyy202424013
PublicationCentury 2000
PublicationDate 2024-12-15
PublicationDateYYYYMMDD 2024-12-15
PublicationDate_xml – month: 12
  year: 2024
  text: 2024-12-15
  day: 15
PublicationDecade 2020
PublicationTitle 计算机工程与应用
PublicationTitle_FL Computer Engineering and Applications
PublicationYear 2024
Publisher 北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021
北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021
Publisher_xml – name: 北方民族大学 图像图形智能处理国家民委重点实验室,银川 750021
– name: 北方民族大学 计算机科学与工程学院,银川 750021%北方民族大学 计算机科学与工程学院,银川 750021
SSID ssib051375739
ssib001102935
ssj0000561668
ssib023646291
ssib057620132
Score 2.0008616
Snippet TP391; 针对深度强化学习推理方法中存在的过拟合以及稀疏奖励的问题,提出了一种融合动作退出和软奖励的强化学习知识推理方法(knowledge reasoning method of...
SourceID wanfang
SourceType Aggregation Database
StartPage 158
Title 融合动作退出和软奖励的强化学习知识推理方法
URI https://d.wanfangdata.com.cn/periodical/jsjgcyyy202424013
Volume 60
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  issn: 1002-8331
  databaseCode: ADMLS
  dateStart: 20200501
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  omitProxy: false
  ssIdentifier: ssib057620132
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1Na9RANNStBz2In_hNEedUtm6SyXycZLKbUkS92EJvZbNNKj2sYNtDe-pBqpQqRbAHEfVS6UWKeGqx_6Zp67_wvTezm_gFVQhhePO-5r1J5s1u3hvPuy1SPZ2HSsDeROLfjKmop7qd17N2CvFIHsh2StU-H4qxCX5vMpocqJ2sfLW0MJ-OdJb-mFfyP14FGPgVs2T_wbN9pgCANvgX7uBhuB_LxyxRTCdMtVgSMd1gSmFDGWagwVncYrrJEo1fM8CFXZLFhpADpppIDjhmFCEGgMKRxz5LJNOGKY6QuOmogMTiGKASJEIz0yDkUeQADIGbgi7BVEJqSFKMIEAL-NCIQ2bPvOyFxUQI-CQXGloSfk9uLIm5RIYqJrkK-WOXYZqTFA69venjdNSaUCBclmWPRErkAnoQa5ANFjRhiUIAZa3RQjskpICJShQYN1zWhNATV389CahGo80fpfnes5382QoNpn2CgErWBzAG2bfvMDkqZjENVIUOqQ9B1wVEr8lOCnHilmOkeMX2lsp2JSwWFfERikQTapwdOJXgHpCKhAzzCwdraL4QBJgHNK10SKzIiLo1DFFhw-ahu1WOlsFeqpxbBu2xDu5xD3hlUfNtcX0XH_n2bI9fl95QSkVLL0oY6UvARANVx8CyDDj6n4HOzs3OdBYXF9ExAW70T3iDAazOjZo3aFoP7j8qA3uIg3UZ2OOpByIoqzxFfigjWda3jZCJq0nqqvwLX7i8WKeZrUCMat_5u9KUytfN292ZStQ5ftY747aLQ8Y---e8gaXH573TlSKiF7y7R-9fFusvitWt_b1335eXi-e7xeu1o73tYnOjWP1y-PZZ8W23WNsoPn_a3_l4-GHzaHvl4NXW4frKwcbOwdc3F72J0WS8OVZ3x6LU53zYUNTzNNc8lbmCQYWwIUglh22FUh0heCeCgLSd5Z1UNsR01qDyjxDSS6G5zEPNhRLhJa_WfdLNLntDvK1kLrIwCqOc8ww4Z1GeRjrHqolCR1e8W274U-61Nzf1m9OuHgfpmneqfPyue7X5pwvZDQjn59Obztc_ADRysuU
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E8%9E%8D%E5%90%88%E5%8A%A8%E4%BD%9C%E9%80%80%E5%87%BA%E5%92%8C%E8%BD%AF%E5%A5%96%E5%8A%B1%E7%9A%84%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0%E7%9F%A5%E8%AF%86%E6%8E%A8%E7%90%86%E6%96%B9%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%B7%A5%E7%A8%8B%E4%B8%8E%E5%BA%94%E7%94%A8&rft.au=%E5%AD%99%E5%B4%87&rft.au=%E7%8E%8B%E6%B5%B7%E8%8D%A3&rft.au=%E8%8D%86%E5%8D%9A%E7%A5%A5&rft.au=%E9%A9%AC%E8%B5%AB&rft.date=2024-12-15&rft.pub=%E5%8C%97%E6%96%B9%E6%B0%91%E6%97%8F%E5%A4%A7%E5%AD%A6+%E5%9B%BE%E5%83%8F%E5%9B%BE%E5%BD%A2%E6%99%BA%E8%83%BD%E5%A4%84%E7%90%86%E5%9B%BD%E5%AE%B6%E6%B0%91%E5%A7%94%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E9%93%B6%E5%B7%9D+750021&rft.issn=1002-8331&rft.volume=60&rft.issue=24&rft.spage=158&rft.epage=165&rft_id=info:doi/10.3778%2Fj.issn.1002-8331.2308-0215&rft.externalDocID=jsjgcyyy202424013
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjgcyyy%2Fjsjgcyyy.jpg