A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning

The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacle...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 110; p. 107605
Main Authors Wen, Shuhuan, Wen, Zeteng, Zhang, Di, Zhang, Hong, Wang, Tao
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.10.2021
Subjects
Online AccessGet full text
ISSN1568-4946
1872-9681
DOI10.1016/j.asoc.2021.107605

Cover

Abstract The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacles and realize autonomous navigation. Firstly, we propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) based on original proximal policy optimization (PPO) to obtain a valid policy of obstacles avoidance. The simulation results show that the proposed dynamic-PPO-CMA can avoid obstacles and reach the designated target position successfully. Secondly, in order to improve the adaptability of multi-robot systems in different environments, we integrate meta-learning with dynamic-PPO-CMA to form the dynamic-PMPO-CMA algorithm. In training process, we use the proposed dynamic-PMPO-CMA to train robots to learn multi-task policy. Finally, in testing process, transfer learning is introduced to the proposed dynamic-PMPO-CMA algorithm. The trained parameters of meta policy are transferred to new environments and regarded as the initial parameters. The simulation results show that the proposed algorithm can have faster convergence rate and arrive the destination more quickly than PPO, PMPO and dynamic-PPO-CMA. •Propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) to extend original proximal policy optimization (PPO) algorithm.•Propose a novel meta reinforcement learning framework for multi-robot path planning to improve the adaptation ability to new unknown environments.•Apply transfer learning to our framework for reducing on-board computation required to train a deep neural network.
AbstractList The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacles and realize autonomous navigation. Firstly, we propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) based on original proximal policy optimization (PPO) to obtain a valid policy of obstacles avoidance. The simulation results show that the proposed dynamic-PPO-CMA can avoid obstacles and reach the designated target position successfully. Secondly, in order to improve the adaptability of multi-robot systems in different environments, we integrate meta-learning with dynamic-PPO-CMA to form the dynamic-PMPO-CMA algorithm. In training process, we use the proposed dynamic-PMPO-CMA to train robots to learn multi-task policy. Finally, in testing process, transfer learning is introduced to the proposed dynamic-PMPO-CMA algorithm. The trained parameters of meta policy are transferred to new environments and regarded as the initial parameters. The simulation results show that the proposed algorithm can have faster convergence rate and arrive the destination more quickly than PPO, PMPO and dynamic-PPO-CMA. •Propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) to extend original proximal policy optimization (PPO) algorithm.•Propose a novel meta reinforcement learning framework for multi-robot path planning to improve the adaptation ability to new unknown environments.•Apply transfer learning to our framework for reducing on-board computation required to train a deep neural network.
ArticleNumber 107605
Author Zhang, Hong
Wang, Tao
Wen, Shuhuan
Wen, Zeteng
Zhang, Di
Author_xml – sequence: 1
  givenname: Shuhuan
  surname: Wen
  fullname: Wen, Shuhuan
  email: swen@ysu.edu.cn
  organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China
– sequence: 2
  givenname: Zeteng
  surname: Wen
  fullname: Wen, Zeteng
  email: 473900582@qq.com
  organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China
– sequence: 3
  givenname: Di
  surname: Zhang
  fullname: Zhang, Di
  email: 1120067126@qq.com
  organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China
– sequence: 4
  givenname: Hong
  surname: Zhang
  fullname: Zhang, Hong
  email: hzhang@sustech.edu.cn
  organization: Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, 518000, China
– sequence: 5
  givenname: Tao
  surname: Wang
  fullname: Wang, Tao
  email: 82368157@qq.com
  organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China
BookMark eNp9kM1qQjEQhUOxULV9ga7yAtcmN_cXuhHpHwjduA9j7kQj9yaSRKHQh2_U0kUXrmaYOd9w5kzIyDqLhDxyNuOMV0-7GQSnZjnLeRrUFStvyJg3dZ61VcNHqS-rJivaorojkxB2LEFt3ozJ95wOhz6azLu1i3QPcZvte7DW2A2FfuO8iduBaucpHKKzbnCHQC0czQaicZYewkk5YITMo7FJqHBAG2mP4M9X1hCwo0kaPdig0f-t7smthj7gw2-dktXry2rxni0_3z4W82WmBGMxK0pQjJXIeVOJlpe6a9YcdIs56qZTSkNdClGJQihRrEWt667IW1VDm_SaiSlpLmeVdyF41FKZeHafHJleciZPIcqdPIUoTyHKS4gJzf-he28G8F_XoecLhOmno0EvgzJoFXbGo4qyc-Ya_gM8_JDl
CitedBy_id crossref_primary_10_1017_S026357472400170X
crossref_primary_10_1109_TCDS_2023_3246107
crossref_primary_10_1016_j_eswa_2024_125238
crossref_primary_10_1109_JAS_2023_123087
crossref_primary_10_3390_s23073625
crossref_primary_10_1109_JSEN_2023_3310519
crossref_primary_10_1016_j_mechatronics_2024_103248
crossref_primary_10_1007_s10489_023_04754_7
crossref_primary_10_26599_AIR_2023_9150013
crossref_primary_10_1016_j_asoc_2022_108588
crossref_primary_10_1016_j_knosys_2023_110782
crossref_primary_10_1007_s40998_024_00722_0
crossref_primary_10_1016_j_jocs_2022_101938
crossref_primary_10_1016_j_ast_2024_109089
crossref_primary_10_3390_electronics12234759
crossref_primary_10_1007_s10845_024_02412_4
crossref_primary_10_1186_s13677_023_00440_8
crossref_primary_10_17979_ja_cea_2024_45_10898
crossref_primary_10_3390_machines10090773
crossref_primary_10_3390_electronics13152927
crossref_primary_10_1016_j_ast_2024_109606
crossref_primary_10_1007_s10462_023_10670_6
crossref_primary_10_3390_app13148174
crossref_primary_10_1016_j_compeleceng_2024_109425
crossref_primary_10_1109_TTE_2022_3142150
crossref_primary_10_1109_JIOT_2024_3379361
crossref_primary_10_1142_S0219843623500147
crossref_primary_10_1109_TITS_2023_3285624
crossref_primary_10_1007_s11082_023_06153_1
crossref_primary_10_1016_j_asoc_2022_109001
Cites_doi 10.1109/LRA.2017.2651371
10.1108/IR-08-2020-0160
10.1109/TSMCC.2011.2157682
10.1016/j.robot.2015.04.003
10.1038/nature14236
10.1007/978-3-030-01270-0_3
10.1109/ICNN.1993.298591
10.15607/RSS.2018.XIV.002
10.1109/CVPR.2019.00691
10.1609/aaai.v32i1.11596
10.1109/LRA.2020.2974685
10.1109/CVPR.2017.769
10.1109/ACCESS.2014.2302442
10.1007/s11370-019-00310-w
10.1109/CVPR.2019.00679
10.3390/app9153057
10.1016/j.actaastro.2020.03.026
10.1109/CVPR.2018.00131
10.1109/ICRA.2017.7989381
ContentType Journal Article
Copyright 2021 Elsevier B.V.
Copyright_xml – notice: 2021 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.asoc.2021.107605
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-9681
ExternalDocumentID 10_1016_j_asoc_2021_107605
S1568494621005263
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
23M
4.4
457
4G.
53G
5GY
5VS
6J9
7-5
71M
8P~
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
UHS
UNMZH
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c300t-45ac005e11863915fd8b1af9e2ef8dccfa75336343c34b37f7d429c7a9863f03
IEDL.DBID .~1
ISSN 1568-4946
IngestDate Thu Apr 24 23:12:03 EDT 2025
Wed Oct 29 21:26:36 EDT 2025
Fri Feb 23 02:43:24 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Deep reinforcement learning
Multi-robot system
Path planning
Transfer learning
Meta learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c300t-45ac005e11863915fd8b1af9e2ef8dccfa75336343c34b37f7d429c7a9863f03
ParticipantIDs crossref_citationtrail_10_1016_j_asoc_2021_107605
crossref_primary_10_1016_j_asoc_2021_107605
elsevier_sciencedirect_doi_10_1016_j_asoc_2021_107605
PublicationCentury 2000
PublicationDate October 2021
2021-10-00
PublicationDateYYYYMMDD 2021-10-01
PublicationDate_xml – month: 10
  year: 2021
  text: October 2021
PublicationDecade 2020
PublicationTitle Applied soft computing
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References J.Y. Bilan, M.Z. Michelle, Robot navigation in crowds via meta-learning, CS234 final project report
S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping and planning for visual navigation: Supplementary material, in: IEEE Conference on Computer Vision & Pattern Recognition, CVPR, 2017, pp. 2616–2625.
Wen, Lv, Lam (b11) 2021
Li, Wang, Tang, Shi, Wu, Zhuang (b36) 2019
X. Wang, W. Xiong, H. Wang, W.Y. Wang, Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation, in: European Conference on Computer Vision, ECCV, 2018, pp. 37–53.
T. Xu, Q. Liu, L. Zhao, J. Peng, Learning to explore with meta-policy gradient, in: International Conference on Machine Learning, ICML, 2018.
J. Schmidhuber, A neural network that embeds its own meta-levels, in: IEEE International Conference on Neural Networks, Vol. 1, March 1993, pp. 407–412.
M. Andrychowicz, M. Denil, S. Gomez, M.W. Hoffman, D. Pfau, T. Schaul, Learning to learn by gradient descent by gradient descent, in: Neural Information Processing Systems, NIPS, 2016.
Arndt, Hazara, Ghadirzadeh, Kyrki (b37) 2019
D. Li, Y. Yang, Y.Z. Song, T.M. Hospedales, Learning to generalize: meta-learning for domain generalization, in: Association for the Advance of Artificial Intelligence, AAAI, 2018.
Nichol, Achiam, Schulman (b16) 2018
Chelsea, Pieter, Sergey (b41) 2017
M. Wortsman, K. Ehsani, M. Rastegari, A. Farhadi, R. Mottaghi, Learning to learn how to learn: self-adaptive visual navigation using meta-learning, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
Gupta, Mendonca, Liu, Abbeel, Levine (b30) 2018
J. Rothfuss, D. Lee, C. Ignasi, J. Lehtinen, Promp: proximal meta-policy search, in: International Conference on Learning Representations, ICLR, 2019.
Gaudet, Linares, Furfaro (b39) 2020; 172
F. Sung, Y. Yang, L. Zhang, T. Xiang, P.H. Torr, T.M. Hospedales, Learning to compare: Relation network for few-shot learning, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Salt Lake City, Utah, USA, 2018, pp. 1199–1208.
Bae, Kim, Kim, Qian, Lee (b15) 2019; 9
Fu, Tang, Hao (b3) 2019
X. Wang, Q. Huang, A. Celikyilmaz, J. Gao, D. Shen, Y.F. Wang, Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 6629–6638.
Gupta, Egorov, Kochenderfer (b7) 2017
Sepp, Steven, Peter (b22) 2001
Wen, Zhao, Yuan, Wang, Manfredi (b14) 2020; 13
Jabri, Hsu, Eysenbach, Gupta, Levine, Finn (b35) 2019
Levine, Finn, Darrell, Abbeel (b21) 2016; 17
Trinh, Ekström, Cürüklü (b4) 2020
Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, F.F. Li, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in: IEEE International Conference on Robotics and Automation, ICRA, 2017, pp. 3357–3364.
Hamalainen, Babadi, Ma, Lehtinen (b31) 2018
Schmidhuber (b18) 1987
A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, T. Lillicrap, Meta-learning with memory-augmented neural networks, in: International Conference on Machine Learning, ICML, New York City, NY, USA, 2016, pp. 1842–1850.
Wen, Chen, Ma, Lam, Hua (b5) 2015; 10
Mnih, Kavukcuoglu, Sliver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglou, King, Kumaran, Wierstra, Legg, Hassabi (b6) 2015; 518
Schulman, Wolski, Dhariwal, Radford, Klimov (b40) 2017
.
Finn, Abbeel, Levine (b17) 2017
Elbanhawi, Simic (b1) 2014; 2
T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, S. Levine, One-shot imitation from observing humans via domain-adaptive meta-learning, in: Royal Statistical Society, RSS, 2018.
Schmidhuber, Zhao, Schraudolph (b25) 1998
N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel, A simple neural attentive meta-learner, in: International Conference on Learning Representations, ICLR, 2018.
Wen, Zheng, Zhu, Li, Chen (b2) 2012; 42
Long, Liu, Pan (b8) 2017; 2
Yu, Tan, Bai, Coumans, Ha (b38) 2020; 5
Li (10.1016/j.asoc.2021.107605_b36) 2019
Levine (10.1016/j.asoc.2021.107605_b21) 2016; 17
Sepp (10.1016/j.asoc.2021.107605_b22) 2001
10.1016/j.asoc.2021.107605_b19
Mnih (10.1016/j.asoc.2021.107605_b6) 2015; 518
10.1016/j.asoc.2021.107605_b13
Bae (10.1016/j.asoc.2021.107605_b15) 2019; 9
10.1016/j.asoc.2021.107605_b12
Schmidhuber (10.1016/j.asoc.2021.107605_b18) 1987
10.1016/j.asoc.2021.107605_b34
Finn (10.1016/j.asoc.2021.107605_b17) 2017
Arndt (10.1016/j.asoc.2021.107605_b37) 2019
10.1016/j.asoc.2021.107605_b33
10.1016/j.asoc.2021.107605_b10
10.1016/j.asoc.2021.107605_b32
Nichol (10.1016/j.asoc.2021.107605_b16) 2018
10.1016/j.asoc.2021.107605_b9
Hamalainen (10.1016/j.asoc.2021.107605_b31) 2018
Yu (10.1016/j.asoc.2021.107605_b38) 2020; 5
Gaudet (10.1016/j.asoc.2021.107605_b39) 2020; 172
Chelsea (10.1016/j.asoc.2021.107605_b41) 2017
Trinh (10.1016/j.asoc.2021.107605_b4) 2020
Fu (10.1016/j.asoc.2021.107605_b3) 2019
Wen (10.1016/j.asoc.2021.107605_b14) 2020; 13
10.1016/j.asoc.2021.107605_b28
10.1016/j.asoc.2021.107605_b27
Elbanhawi (10.1016/j.asoc.2021.107605_b1) 2014; 2
10.1016/j.asoc.2021.107605_b29
Jabri (10.1016/j.asoc.2021.107605_b35) 2019
10.1016/j.asoc.2021.107605_b24
10.1016/j.asoc.2021.107605_b23
10.1016/j.asoc.2021.107605_b26
10.1016/j.asoc.2021.107605_b20
Gupta (10.1016/j.asoc.2021.107605_b30) 2018
Gupta (10.1016/j.asoc.2021.107605_b7) 2017
Long (10.1016/j.asoc.2021.107605_b8) 2017; 2
Schulman (10.1016/j.asoc.2021.107605_b40) 2017
Wen (10.1016/j.asoc.2021.107605_b5) 2015; 10
Wen (10.1016/j.asoc.2021.107605_b11) 2021
Schmidhuber (10.1016/j.asoc.2021.107605_b25) 1998
Wen (10.1016/j.asoc.2021.107605_b2) 2012; 42
References_xml – reference: T. Xu, Q. Liu, L. Zhao, J. Peng, Learning to explore with meta-policy gradient, in: International Conference on Machine Learning, ICML, 2018.
– reference: J. Rothfuss, D. Lee, C. Ignasi, J. Lehtinen, Promp: proximal meta-policy search, in: International Conference on Learning Representations, ICLR, 2019.
– reference: X. Wang, W. Xiong, H. Wang, W.Y. Wang, Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation, in: European Conference on Computer Vision, ECCV, 2018, pp. 37–53.
– volume: 5
  start-page: 2950
  year: 2020
  end-page: 2957
  ident: b38
  article-title: Learning fast adaptation with meta strategy optimization
  publication-title: IEEE Robt. Autom. Lett.
– reference: N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel, A simple neural attentive meta-learner, in: International Conference on Learning Representations, ICLR, 2018.
– volume: 13
  start-page: 263
  year: 2020
  end-page: 272
  ident: b14
  article-title: Path planning for active SLAM based on deep reinforcement learning under unknown environments
  publication-title: Intell. Serv. Robot.
– reference: T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, S. Levine, One-shot imitation from observing humans via domain-adaptive meta-learning, in: Royal Statistical Society, RSS, 2018.
– year: 2018
  ident: b31
  article-title: PPO-CMA: Proximal policy optimization with covariance matrix adaptation
– reference: J.Y. Bilan, M.Z. Michelle, Robot navigation in crowds via meta-learning, CS234 final project report,
– reference: D. Li, Y. Yang, Y.Z. Song, T.M. Hospedales, Learning to generalize: meta-learning for domain generalization, in: Association for the Advance of Artificial Intelligence, AAAI, 2018.
– volume: 10
  start-page: 29
  year: 2015
  end-page: 36
  ident: b5
  article-title: The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments
  publication-title: Robot. Auton. Syst.
– reference: F. Sung, Y. Yang, L. Zhang, T. Xiang, P.H. Torr, T.M. Hospedales, Learning to compare: Relation network for few-shot learning, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Salt Lake City, Utah, USA, 2018, pp. 1199–1208.
– start-page: 66
  year: 2017
  end-page: 83
  ident: b7
  article-title: Cooperative multiagent control using deep reinforcement learning
  publication-title: International Conference on Autonomous Agents and Multiagent Systems
– year: 2018
  ident: b30
  article-title: Meta-reinforcement learning of structured exploration strategies
– year: 2019
  ident: b36
  article-title: Unsupervised reinforcement learning of transferable meta-skills for embodied navigation
– start-page: 1
  year: 2017
  end-page: 12
  ident: b41
  article-title: Model-agnostic meta-learning for fast adaptation of deep networks
– reference: X. Wang, Q. Huang, A. Celikyilmaz, J. Gao, D. Shen, Y.F. Wang, Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 6629–6638.
– reference: M. Wortsman, K. Ehsani, M. Rastegari, A. Farhadi, R. Mottaghi, Learning to learn how to learn: self-adaptive visual navigation using meta-learning, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
– year: 2019
  ident: b35
  article-title: Unsupervised curricula for visual meta-reinforcement learning
– volume: 2
  start-page: 56
  year: 2014
  end-page: 77
  ident: b1
  article-title: Sampling-based robot motion planning: A review
  publication-title: IEEE Access
– start-page: 1126
  year: 2017
  end-page: 1135
  ident: b17
  article-title: Model-agnostic meta-learning for fast adaptation of deep networks
  publication-title: International Conference on Machine Learning, Vol. 70
– year: 1987
  ident: b18
  article-title: Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta...-Hook
– reference: M. Andrychowicz, M. Denil, S. Gomez, M.W. Hoffman, D. Pfau, T. Schaul, Learning to learn by gradient descent by gradient descent, in: Neural Information Processing Systems, NIPS, 2016.
– year: 2019
  ident: b37
  article-title: Meta reinforcement learning for sim-to-real domain adaptation
– reference: Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, F.F. Li, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in: IEEE International Conference on Robotics and Automation, ICRA, 2017, pp. 3357–3364.
– volume: 9
  start-page: 3057
  year: 2019
  ident: b15
  article-title: Multi-robot path planning method using reinforcement learning
  publication-title: Appl. Sci.
– volume: 518
  start-page: 529
  year: 2015
  end-page: 533
  ident: b6
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
– start-page: 87
  year: 2001
  end-page: 94
  ident: b22
  article-title: Learning to learn using gradient descent
  publication-title: International Conference on Artificial Neural Networks
– reference: J. Schmidhuber, A neural network that embeds its own meta-levels, in: IEEE International Conference on Neural Networks, Vol. 1, March 1993, pp. 407–412.
– year: 2019
  ident: b3
  article-title: Efficient meta reinforcement learning via meta goal generation
– reference: .
– volume: 42
  start-page: 603
  year: 2012
  end-page: 608
  ident: b2
  article-title: Elman fuzzy adaptive control for obstacle avoidance of mobile robots using hybrid force/position incorporation
  publication-title: IEEE Trans. Syst. Man Cybern. C
– year: 2021
  ident: b11
  article-title: Probability dueling DQN active visual SLAM for autonomous navigation in indoor environment
  publication-title: Ind. Robot
– year: 2017
  ident: b40
  article-title: Proximal policy optimization algorithms
– volume: 17
  start-page: 1334
  year: 2016
  end-page: 1373
  ident: b21
  article-title: End-to-end training of deep visuomotor policies
  publication-title: J. Mach. Learn. Res.
– year: 2018
  ident: b16
  article-title: On first-order meta-learning algorithms
– reference: A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, T. Lillicrap, Meta-learning with memory-augmented neural networks, in: International Conference on Machine Learning, ICML, New York City, NY, USA, 2016, pp. 1842–1850.
– volume: 172
  start-page: 90
  year: 2020
  end-page: 99
  ident: b39
  article-title: Six degree-of-freedom body-fixed hovering over unmapped asteroids via LIDAR altimetry and reinforcement meta-learning
  publication-title: Acta Astronaut.
– start-page: 293
  year: 1998
  end-page: 309
  ident: b25
  article-title: Learning to learn
  publication-title: Ch. Reinforcement Learning with Self-Modifying Policies
– start-page: 113
  year: 2020
  end-page: 118
  ident: b4
  article-title: Multi-path planning for autonomous navigation of multiple robots in a shared workspace with humans
  publication-title: 2020 6th International Conference on Control, Automation and Robotics
– volume: 2
  start-page: 656
  year: 2017
  end-page: 663
  ident: b8
  article-title: Deep-learned collision avoidance policy for distributed multiagent navigation
  publication-title: Robot. Autom. Lett.
– reference: S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping and planning for visual navigation: Supplementary material, in: IEEE Conference on Computer Vision & Pattern Recognition, CVPR, 2017, pp. 2616–2625.
– start-page: 66
  year: 2017
  ident: 10.1016/j.asoc.2021.107605_b7
  article-title: Cooperative multiagent control using deep reinforcement learning
– start-page: 113
  year: 2020
  ident: 10.1016/j.asoc.2021.107605_b4
  article-title: Multi-path planning for autonomous navigation of multiple robots in a shared workspace with humans
– volume: 2
  start-page: 656
  issue: 2
  year: 2017
  ident: 10.1016/j.asoc.2021.107605_b8
  article-title: Deep-learned collision avoidance policy for distributed multiagent navigation
  publication-title: Robot. Autom. Lett.
  doi: 10.1109/LRA.2017.2651371
– year: 2021
  ident: 10.1016/j.asoc.2021.107605_b11
  article-title: Probability dueling DQN active visual SLAM for autonomous navigation in indoor environment
  publication-title: Ind. Robot
  doi: 10.1108/IR-08-2020-0160
– year: 2019
  ident: 10.1016/j.asoc.2021.107605_b37
– ident: 10.1016/j.asoc.2021.107605_b32
– volume: 42
  start-page: 603
  issue: 4
  year: 2012
  ident: 10.1016/j.asoc.2021.107605_b2
  article-title: Elman fuzzy adaptive control for obstacle avoidance of mobile robots using hybrid force/position incorporation
  publication-title: IEEE Trans. Syst. Man Cybern. C
  doi: 10.1109/TSMCC.2011.2157682
– volume: 10
  start-page: 29
  issue: 72
  year: 2015
  ident: 10.1016/j.asoc.2021.107605_b5
  article-title: The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments
  publication-title: Robot. Auton. Syst.
  doi: 10.1016/j.robot.2015.04.003
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 10.1016/j.asoc.2021.107605_b6
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– start-page: 87
  year: 2001
  ident: 10.1016/j.asoc.2021.107605_b22
  article-title: Learning to learn using gradient descent
– ident: 10.1016/j.asoc.2021.107605_b9
  doi: 10.1007/978-3-030-01270-0_3
– ident: 10.1016/j.asoc.2021.107605_b28
– volume: 17
  start-page: 1334
  issue: 39
  year: 2016
  ident: 10.1016/j.asoc.2021.107605_b21
  article-title: End-to-end training of deep visuomotor policies
  publication-title: J. Mach. Learn. Res.
– year: 2019
  ident: 10.1016/j.asoc.2021.107605_b36
– start-page: 1
  year: 2017
  ident: 10.1016/j.asoc.2021.107605_b41
– ident: 10.1016/j.asoc.2021.107605_b24
  doi: 10.1109/ICNN.1993.298591
– year: 2019
  ident: 10.1016/j.asoc.2021.107605_b3
– start-page: 1126
  year: 2017
  ident: 10.1016/j.asoc.2021.107605_b17
  article-title: Model-agnostic meta-learning for fast adaptation of deep networks
– ident: 10.1016/j.asoc.2021.107605_b29
  doi: 10.15607/RSS.2018.XIV.002
– ident: 10.1016/j.asoc.2021.107605_b34
  doi: 10.1109/CVPR.2019.00691
– ident: 10.1016/j.asoc.2021.107605_b26
  doi: 10.1609/aaai.v32i1.11596
– volume: 5
  start-page: 2950
  issue: 2
  year: 2020
  ident: 10.1016/j.asoc.2021.107605_b38
  article-title: Learning fast adaptation with meta strategy optimization
  publication-title: IEEE Robt. Autom. Lett.
  doi: 10.1109/LRA.2020.2974685
– ident: 10.1016/j.asoc.2021.107605_b10
  doi: 10.1109/CVPR.2017.769
– ident: 10.1016/j.asoc.2021.107605_b20
– volume: 2
  start-page: 56
  year: 2014
  ident: 10.1016/j.asoc.2021.107605_b1
  article-title: Sampling-based robot motion planning: A review
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2014.2302442
– volume: 13
  start-page: 263
  year: 2020
  ident: 10.1016/j.asoc.2021.107605_b14
  article-title: Path planning for active SLAM based on deep reinforcement learning under unknown environments
  publication-title: Intell. Serv. Robot.
  doi: 10.1007/s11370-019-00310-w
– year: 2019
  ident: 10.1016/j.asoc.2021.107605_b35
– ident: 10.1016/j.asoc.2021.107605_b12
  doi: 10.1109/CVPR.2019.00679
– year: 2018
  ident: 10.1016/j.asoc.2021.107605_b31
– ident: 10.1016/j.asoc.2021.107605_b33
– year: 2018
  ident: 10.1016/j.asoc.2021.107605_b16
– start-page: 293
  year: 1998
  ident: 10.1016/j.asoc.2021.107605_b25
  article-title: Learning to learn
– ident: 10.1016/j.asoc.2021.107605_b27
– volume: 9
  start-page: 3057
  issue: 15
  year: 2019
  ident: 10.1016/j.asoc.2021.107605_b15
  article-title: Multi-robot path planning method using reinforcement learning
  publication-title: Appl. Sci.
  doi: 10.3390/app9153057
– volume: 172
  start-page: 90
  year: 2020
  ident: 10.1016/j.asoc.2021.107605_b39
  article-title: Six degree-of-freedom body-fixed hovering over unmapped asteroids via LIDAR altimetry and reinforcement meta-learning
  publication-title: Acta Astronaut.
  doi: 10.1016/j.actaastro.2020.03.026
– ident: 10.1016/j.asoc.2021.107605_b19
  doi: 10.1109/CVPR.2018.00131
– year: 1987
  ident: 10.1016/j.asoc.2021.107605_b18
– year: 2017
  ident: 10.1016/j.asoc.2021.107605_b40
– ident: 10.1016/j.asoc.2021.107605_b13
  doi: 10.1109/ICRA.2017.7989381
– ident: 10.1016/j.asoc.2021.107605_b23
– year: 2018
  ident: 10.1016/j.asoc.2021.107605_b30
SSID ssj0016928
Score 2.5299084
Snippet The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 107605
SubjectTerms Deep reinforcement learning
Meta learning
Multi-robot system
Path planning
Transfer learning
Title A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning
URI https://dx.doi.org/10.1016/j.asoc.2021.107605
Volume 110
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1872-9681
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0016928
  issn: 1568-4946
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1872-9681
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0016928
  issn: 1568-4946
  databaseCode: ACRLP
  dateStart: 20010601
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ]
  customDbUrl:
  eissn: 1872-9681
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0016928
  issn: 1568-4946
  databaseCode: AIKHN
  dateStart: 20010601
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Science Direct
  customDbUrl:
  eissn: 1872-9681
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0016928
  issn: 1568-4946
  databaseCode: .~1
  dateStart: 20010601
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1872-9681
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0016928
  issn: 1568-4946
  databaseCode: AKRWK
  dateStart: 20010601
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA6iFy--xTc5eJO4203abI_LoqxPxAd4K3muirbLUj2Jv92ZNBUF8eCptJ20ZZpm5isz30fIvkxUzxqdMWdzzYT0immnFcNucC2ktjLIt11cZqM7cXqf3s-QYdsLg2WVce1v1vSwWscjnejNzuTxsXMDyKMvcpEBaEHSEmT8FEKiisHhx1eZR5LlQV8VjRlax8aZpsZLgQcAI_YSOCAzlLD7LTh9CzjHS2QhZop00DzMMplx5QpZbFUYaPwoV8n7gIaqQDatdFVTlBhmkyhFRNXzuAL4__BCITml6rXGHgYA-7RUb4Fcoyoplr6P6YurFZu6QKRqwj9DGhUlxhRDnaVgWoc0F-7enlojt8dHt8MRi5oKzPBut2YiVQbc5QBXZMgN721fJ8rnrud83xrjFeAXnnHBDReaSy8tRCwjVQ72vsvXyWxZlW6DUMdTDdcROuFGZBDzlBNpT1ivrNdO6k2StL4sTOQbR9mL56ItLHsq0P8F-r9o_L9JDr7GTBq2jT-t0_YVFT_mTAHh4I9xW_8ct03mca8p5dshs_X01e1CSlLrvTDn9sjcYHh9foXbk7PR5SecceV4
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELagDLDwRrzxwIZMm9qJm7GqqMpzoUjdIj8LCJKqSpkQv52z41QgoQ6s9jmJvji--6K7-xA655FoayUTYnQqCeNWEGmkIK4aXDIuNffybfcPyeCJ3Yzi0RLq1bUwLq0ynP3Vme5P6zDSDGg2Jy8vzUdgHh2WsgRIi2taQpfRCovb3DGwy695nkeUpF5g1VkTZx4qZ6okLwEQAElsRzDAE6dh95d3-uFx-ptoPYSKuFs9zRZaMvk22qhlGHD4KnfQZxf7tEAyLWRRYqcxTCZBiwiLt3EB_P_5HUN0isWsdEUMwPZxLj58d40ixy73fYzfTSnI1PhOqsr_NMRBUmKMna_TGExLH-fC3eupXTTsXw17AxJEFYiirVZJWCwU4GWAWCSuObzVHRkJm5q2sR2tlBVAYGhCGVWUScot1-CyFBcp2NsW3UONvMjNPsKGxhKuw2REFUvA6QkD-DNthbbScHmAohrLTIWG40734i2rM8teM4d_5vDPKvwP0MV8zaRqt7HQOq5fUfZr02TgDxasO_znujO0Ohje32V31w-3R2jNzVR5fceoUU5n5gTik1Ke-v33DdHS5Xg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multi-robot+path-planning+algorithm+for+autonomous+navigation+using+meta-reinforcement+learning+based+on+transfer+learning&rft.jtitle=Applied+soft+computing&rft.au=Wen%2C+Shuhuan&rft.au=Wen%2C+Zeteng&rft.au=Zhang%2C+Di&rft.au=Zhang%2C+Hong&rft.date=2021-10-01&rft.pub=Elsevier+B.V&rft.issn=1568-4946&rft.eissn=1872-9681&rft.volume=110&rft_id=info:doi/10.1016%2Fj.asoc.2021.107605&rft.externalDocID=S1568494621005263
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1568-4946&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1568-4946&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1568-4946&client=summon