A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning

The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacle...

Full description

Saved in:

Bibliographic Details
Published in	Applied soft computing Vol. 110; p. 107605
Main Authors	Wen, Shuhuan, Wen, Zeteng, Zhang, Di, Zhang, Hong, Wang, Tao
Format	Journal Article
Language	English
Published	Elsevier B.V 01.10.2021
Subjects	Deep reinforcement learning Meta learning Multi-robot system Path planning Transfer learning Deep reinforcement learning Multi-robot system Path planning Transfer learning Meta learning
Online Access	Get full text
ISSN	1568-4946 1872-9681
DOI	10.1016/j.asoc.2021.107605

Cover

Abstract	The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacles and realize autonomous navigation. Firstly, we propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) based on original proximal policy optimization (PPO) to obtain a valid policy of obstacles avoidance. The simulation results show that the proposed dynamic-PPO-CMA can avoid obstacles and reach the designated target position successfully. Secondly, in order to improve the adaptability of multi-robot systems in different environments, we integrate meta-learning with dynamic-PPO-CMA to form the dynamic-PMPO-CMA algorithm. In training process, we use the proposed dynamic-PMPO-CMA to train robots to learn multi-task policy. Finally, in testing process, transfer learning is introduced to the proposed dynamic-PMPO-CMA algorithm. The trained parameters of meta policy are transferred to new environments and regarded as the initial parameters. The simulation results show that the proposed algorithm can have faster convergence rate and arrive the destination more quickly than PPO, PMPO and dynamic-PPO-CMA. •Propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) to extend original proximal policy optimization (PPO) algorithm.•Propose a novel meta reinforcement learning framework for multi-robot path planning to improve the adaptation ability to new unknown environments.•Apply transfer learning to our framework for reducing on-board computation required to train a deep neural network.
AbstractList	The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacles and realize autonomous navigation. Firstly, we propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) based on original proximal policy optimization (PPO) to obtain a valid policy of obstacles avoidance. The simulation results show that the proposed dynamic-PPO-CMA can avoid obstacles and reach the designated target position successfully. Secondly, in order to improve the adaptability of multi-robot systems in different environments, we integrate meta-learning with dynamic-PPO-CMA to form the dynamic-PMPO-CMA algorithm. In training process, we use the proposed dynamic-PMPO-CMA to train robots to learn multi-task policy. Finally, in testing process, transfer learning is introduced to the proposed dynamic-PMPO-CMA algorithm. The trained parameters of meta policy are transferred to new environments and regarded as the initial parameters. The simulation results show that the proposed algorithm can have faster convergence rate and arrive the destination more quickly than PPO, PMPO and dynamic-PPO-CMA. •Propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) to extend original proximal policy optimization (PPO) algorithm.•Propose a novel meta reinforcement learning framework for multi-robot path planning to improve the adaptation ability to new unknown environments.•Apply transfer learning to our framework for reducing on-board computation required to train a deep neural network.
ArticleNumber	107605
Author	Zhang, Hong Wang, Tao Wen, Shuhuan Wen, Zeteng Zhang, Di
Author_xml	– sequence: 1 givenname: Shuhuan surname: Wen fullname: Wen, Shuhuan email: swen@ysu.edu.cn organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China – sequence: 2 givenname: Zeteng surname: Wen fullname: Wen, Zeteng email: 473900582@qq.com organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China – sequence: 3 givenname: Di surname: Zhang fullname: Zhang, Di email: 1120067126@qq.com organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China – sequence: 4 givenname: Hong surname: Zhang fullname: Zhang, Hong email: hzhang@sustech.edu.cn organization: Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, 518000, China – sequence: 5 givenname: Tao surname: Wang fullname: Wang, Tao email: 82368157@qq.com organization: Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment, Yanshan University, Qinhuangdao, 066004, China
BookMark	eNp9kM1qQjEQhUOxULV9ga7yAtcmN_cXuhHpHwjduA9j7kQj9yaSRKHQh2_U0kUXrmaYOd9w5kzIyDqLhDxyNuOMV0-7GQSnZjnLeRrUFStvyJg3dZ61VcNHqS-rJivaorojkxB2LEFt3ozJ95wOhz6azLu1i3QPcZvte7DW2A2FfuO8iduBaucpHKKzbnCHQC0czQaicZYewkk5YITMo7FJqHBAG2mP4M9X1hCwo0kaPdig0f-t7smthj7gw2-dktXry2rxni0_3z4W82WmBGMxK0pQjJXIeVOJlpe6a9YcdIs56qZTSkNdClGJQihRrEWt667IW1VDm_SaiSlpLmeVdyF41FKZeHafHJleciZPIcqdPIUoTyHKS4gJzf-he28G8F_XoecLhOmno0EvgzJoFXbGo4qyc-Ya_gM8_JDl
CitedBy_id	crossref_primary_10_1017_S026357472400170X crossref_primary_10_1109_TCDS_2023_3246107 crossref_primary_10_1016_j_eswa_2024_125238 crossref_primary_10_1109_JAS_2023_123087 crossref_primary_10_3390_s23073625 crossref_primary_10_1109_JSEN_2023_3310519 crossref_primary_10_1016_j_mechatronics_2024_103248 crossref_primary_10_1007_s10489_023_04754_7 crossref_primary_10_26599_AIR_2023_9150013 crossref_primary_10_1016_j_asoc_2022_108588 crossref_primary_10_1016_j_knosys_2023_110782 crossref_primary_10_1007_s40998_024_00722_0 crossref_primary_10_1016_j_jocs_2022_101938 crossref_primary_10_1016_j_ast_2024_109089 crossref_primary_10_3390_electronics12234759 crossref_primary_10_1007_s10845_024_02412_4 crossref_primary_10_1186_s13677_023_00440_8 crossref_primary_10_17979_ja_cea_2024_45_10898 crossref_primary_10_3390_machines10090773 crossref_primary_10_3390_electronics13152927 crossref_primary_10_1016_j_ast_2024_109606 crossref_primary_10_1007_s10462_023_10670_6 crossref_primary_10_3390_app13148174 crossref_primary_10_1016_j_compeleceng_2024_109425 crossref_primary_10_1109_TTE_2022_3142150 crossref_primary_10_1109_JIOT_2024_3379361 crossref_primary_10_1142_S0219843623500147 crossref_primary_10_1109_TITS_2023_3285624 crossref_primary_10_1007_s11082_023_06153_1 crossref_primary_10_1016_j_asoc_2022_109001
Cites_doi	10.1109/LRA.2017.2651371 10.1108/IR-08-2020-0160 10.1109/TSMCC.2011.2157682 10.1016/j.robot.2015.04.003 10.1038/nature14236 10.1007/978-3-030-01270-0_3 10.1109/ICNN.1993.298591 10.15607/RSS.2018.XIV.002 10.1109/CVPR.2019.00691 10.1609/aaai.v32i1.11596 10.1109/LRA.2020.2974685 10.1109/CVPR.2017.769 10.1109/ACCESS.2014.2302442 10.1007/s11370-019-00310-w 10.1109/CVPR.2019.00679 10.3390/app9153057 10.1016/j.actaastro.2020.03.026 10.1109/CVPR.2018.00131 10.1109/ICRA.2017.7989381
ContentType	Journal Article
Copyright	2021 Elsevier B.V.
Copyright_xml	– notice: 2021 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.asoc.2021.107605
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-9681
ExternalDocumentID	10_1016_j_asoc_2021_107605 S1568494621005263
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1~. 1~5 23M 4.4 457 4G. 53G 5GY 5VS 6J9 7-5 71M 8P~ AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF HZ~ IHE J1W JJJVA KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SES SEW SPC SPCBC SST SSV SSZ T5K UHS UNMZH ~G- AATTM AAXKI AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD
ID	FETCH-LOGICAL-c300t-45ac005e11863915fd8b1af9e2ef8dccfa75336343c34b37f7d429c7a9863f03
IEDL.DBID	.~1
ISSN	1568-4946
IngestDate	Thu Apr 24 23:12:03 EDT 2025 Wed Oct 29 21:26:36 EDT 2025 Fri Feb 23 02:43:24 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Deep reinforcement learning Multi-robot system Path planning Transfer learning Meta learning
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c300t-45ac005e11863915fd8b1af9e2ef8dccfa75336343c34b37f7d429c7a9863f03
ParticipantIDs	crossref_citationtrail_10_1016_j_asoc_2021_107605 crossref_primary_10_1016_j_asoc_2021_107605 elsevier_sciencedirect_doi_10_1016_j_asoc_2021_107605
PublicationCentury	2000
PublicationDate	October 2021 2021-10-00
PublicationDateYYYYMMDD	2021-10-01
PublicationDate_xml	– month: 10 year: 2021 text: October 2021
PublicationDecade	2020
PublicationTitle	Applied soft computing
PublicationYear	2021
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	J.Y. Bilan, M.Z. Michelle, Robot navigation in crowds via meta-learning, CS234 final project report S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping and planning for visual navigation: Supplementary material, in: IEEE Conference on Computer Vision & Pattern Recognition, CVPR, 2017, pp. 2616–2625. Wen, Lv, Lam (b11) 2021 Li, Wang, Tang, Shi, Wu, Zhuang (b36) 2019 X. Wang, W. Xiong, H. Wang, W.Y. Wang, Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation, in: European Conference on Computer Vision, ECCV, 2018, pp. 37–53. T. Xu, Q. Liu, L. Zhao, J. Peng, Learning to explore with meta-policy gradient, in: International Conference on Machine Learning, ICML, 2018. J. Schmidhuber, A neural network that embeds its own meta-levels, in: IEEE International Conference on Neural Networks, Vol. 1, March 1993, pp. 407–412. M. Andrychowicz, M. Denil, S. Gomez, M.W. Hoffman, D. Pfau, T. Schaul, Learning to learn by gradient descent by gradient descent, in: Neural Information Processing Systems, NIPS, 2016. Arndt, Hazara, Ghadirzadeh, Kyrki (b37) 2019 D. Li, Y. Yang, Y.Z. Song, T.M. Hospedales, Learning to generalize: meta-learning for domain generalization, in: Association for the Advance of Artificial Intelligence, AAAI, 2018. Nichol, Achiam, Schulman (b16) 2018 Chelsea, Pieter, Sergey (b41) 2017 M. Wortsman, K. Ehsani, M. Rastegari, A. Farhadi, R. Mottaghi, Learning to learn how to learn: self-adaptive visual navigation using meta-learning, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019. Gupta, Mendonca, Liu, Abbeel, Levine (b30) 2018 J. Rothfuss, D. Lee, C. Ignasi, J. Lehtinen, Promp: proximal meta-policy search, in: International Conference on Learning Representations, ICLR, 2019. Gaudet, Linares, Furfaro (b39) 2020; 172 F. Sung, Y. Yang, L. Zhang, T. Xiang, P.H. Torr, T.M. Hospedales, Learning to compare: Relation network for few-shot learning, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Salt Lake City, Utah, USA, 2018, pp. 1199–1208. Bae, Kim, Kim, Qian, Lee (b15) 2019; 9 Fu, Tang, Hao (b3) 2019 X. Wang, Q. Huang, A. Celikyilmaz, J. Gao, D. Shen, Y.F. Wang, Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 6629–6638. Gupta, Egorov, Kochenderfer (b7) 2017 Sepp, Steven, Peter (b22) 2001 Wen, Zhao, Yuan, Wang, Manfredi (b14) 2020; 13 Jabri, Hsu, Eysenbach, Gupta, Levine, Finn (b35) 2019 Levine, Finn, Darrell, Abbeel (b21) 2016; 17 Trinh, Ekström, Cürüklü (b4) 2020 Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, F.F. Li, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in: IEEE International Conference on Robotics and Automation, ICRA, 2017, pp. 3357–3364. Hamalainen, Babadi, Ma, Lehtinen (b31) 2018 Schmidhuber (b18) 1987 A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, T. Lillicrap, Meta-learning with memory-augmented neural networks, in: International Conference on Machine Learning, ICML, New York City, NY, USA, 2016, pp. 1842–1850. Wen, Chen, Ma, Lam, Hua (b5) 2015; 10 Mnih, Kavukcuoglu, Sliver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglou, King, Kumaran, Wierstra, Legg, Hassabi (b6) 2015; 518 Schulman, Wolski, Dhariwal, Radford, Klimov (b40) 2017 . Finn, Abbeel, Levine (b17) 2017 Elbanhawi, Simic (b1) 2014; 2 T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, S. Levine, One-shot imitation from observing humans via domain-adaptive meta-learning, in: Royal Statistical Society, RSS, 2018. Schmidhuber, Zhao, Schraudolph (b25) 1998 N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel, A simple neural attentive meta-learner, in: International Conference on Learning Representations, ICLR, 2018. Wen, Zheng, Zhu, Li, Chen (b2) 2012; 42 Long, Liu, Pan (b8) 2017; 2 Yu, Tan, Bai, Coumans, Ha (b38) 2020; 5 Li (10.1016/j.asoc.2021.107605_b36) 2019 Levine (10.1016/j.asoc.2021.107605_b21) 2016; 17 Sepp (10.1016/j.asoc.2021.107605_b22) 2001 10.1016/j.asoc.2021.107605_b19 Mnih (10.1016/j.asoc.2021.107605_b6) 2015; 518 10.1016/j.asoc.2021.107605_b13 Bae (10.1016/j.asoc.2021.107605_b15) 2019; 9 10.1016/j.asoc.2021.107605_b12 Schmidhuber (10.1016/j.asoc.2021.107605_b18) 1987 10.1016/j.asoc.2021.107605_b34 Finn (10.1016/j.asoc.2021.107605_b17) 2017 Arndt (10.1016/j.asoc.2021.107605_b37) 2019 10.1016/j.asoc.2021.107605_b33 10.1016/j.asoc.2021.107605_b10 10.1016/j.asoc.2021.107605_b32 Nichol (10.1016/j.asoc.2021.107605_b16) 2018 10.1016/j.asoc.2021.107605_b9 Hamalainen (10.1016/j.asoc.2021.107605_b31) 2018 Yu (10.1016/j.asoc.2021.107605_b38) 2020; 5 Gaudet (10.1016/j.asoc.2021.107605_b39) 2020; 172 Chelsea (10.1016/j.asoc.2021.107605_b41) 2017 Trinh (10.1016/j.asoc.2021.107605_b4) 2020 Fu (10.1016/j.asoc.2021.107605_b3) 2019 Wen (10.1016/j.asoc.2021.107605_b14) 2020; 13 10.1016/j.asoc.2021.107605_b28 10.1016/j.asoc.2021.107605_b27 Elbanhawi (10.1016/j.asoc.2021.107605_b1) 2014; 2 10.1016/j.asoc.2021.107605_b29 Jabri (10.1016/j.asoc.2021.107605_b35) 2019 10.1016/j.asoc.2021.107605_b24 10.1016/j.asoc.2021.107605_b23 10.1016/j.asoc.2021.107605_b26 10.1016/j.asoc.2021.107605_b20 Gupta (10.1016/j.asoc.2021.107605_b30) 2018 Gupta (10.1016/j.asoc.2021.107605_b7) 2017 Long (10.1016/j.asoc.2021.107605_b8) 2017; 2 Schulman (10.1016/j.asoc.2021.107605_b40) 2017 Wen (10.1016/j.asoc.2021.107605_b5) 2015; 10 Wen (10.1016/j.asoc.2021.107605_b11) 2021 Schmidhuber (10.1016/j.asoc.2021.107605_b25) 1998 Wen (10.1016/j.asoc.2021.107605_b2) 2012; 42
References_xml	– reference: T. Xu, Q. Liu, L. Zhao, J. Peng, Learning to explore with meta-policy gradient, in: International Conference on Machine Learning, ICML, 2018. – reference: J. Rothfuss, D. Lee, C. Ignasi, J. Lehtinen, Promp: proximal meta-policy search, in: International Conference on Learning Representations, ICLR, 2019. – reference: X. Wang, W. Xiong, H. Wang, W.Y. Wang, Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation, in: European Conference on Computer Vision, ECCV, 2018, pp. 37–53. – volume: 5 start-page: 2950 year: 2020 end-page: 2957 ident: b38 article-title: Learning fast adaptation with meta strategy optimization publication-title: IEEE Robt. Autom. Lett. – reference: N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel, A simple neural attentive meta-learner, in: International Conference on Learning Representations, ICLR, 2018. – volume: 13 start-page: 263 year: 2020 end-page: 272 ident: b14 article-title: Path planning for active SLAM based on deep reinforcement learning under unknown environments publication-title: Intell. Serv. Robot. – reference: T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, S. Levine, One-shot imitation from observing humans via domain-adaptive meta-learning, in: Royal Statistical Society, RSS, 2018. – year: 2018 ident: b31 article-title: PPO-CMA: Proximal policy optimization with covariance matrix adaptation – reference: J.Y. Bilan, M.Z. Michelle, Robot navigation in crowds via meta-learning, CS234 final project report, – reference: D. Li, Y. Yang, Y.Z. Song, T.M. Hospedales, Learning to generalize: meta-learning for domain generalization, in: Association for the Advance of Artificial Intelligence, AAAI, 2018. – volume: 10 start-page: 29 year: 2015 end-page: 36 ident: b5 article-title: The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments publication-title: Robot. Auton. Syst. – reference: F. Sung, Y. Yang, L. Zhang, T. Xiang, P.H. Torr, T.M. Hospedales, Learning to compare: Relation network for few-shot learning, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Salt Lake City, Utah, USA, 2018, pp. 1199–1208. – start-page: 66 year: 2017 end-page: 83 ident: b7 article-title: Cooperative multiagent control using deep reinforcement learning publication-title: International Conference on Autonomous Agents and Multiagent Systems – year: 2018 ident: b30 article-title: Meta-reinforcement learning of structured exploration strategies – year: 2019 ident: b36 article-title: Unsupervised reinforcement learning of transferable meta-skills for embodied navigation – start-page: 1 year: 2017 end-page: 12 ident: b41 article-title: Model-agnostic meta-learning for fast adaptation of deep networks – reference: X. Wang, Q. Huang, A. Celikyilmaz, J. Gao, D. Shen, Y.F. Wang, Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 6629–6638. – reference: M. Wortsman, K. Ehsani, M. Rastegari, A. Farhadi, R. Mottaghi, Learning to learn how to learn: self-adaptive visual navigation using meta-learning, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019. – year: 2019 ident: b35 article-title: Unsupervised curricula for visual meta-reinforcement learning – volume: 2 start-page: 56 year: 2014 end-page: 77 ident: b1 article-title: Sampling-based robot motion planning: A review publication-title: IEEE Access – start-page: 1126 year: 2017 end-page: 1135 ident: b17 article-title: Model-agnostic meta-learning for fast adaptation of deep networks publication-title: International Conference on Machine Learning, Vol. 70 – year: 1987 ident: b18 article-title: Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta...-Hook – reference: M. Andrychowicz, M. Denil, S. Gomez, M.W. Hoffman, D. Pfau, T. Schaul, Learning to learn by gradient descent by gradient descent, in: Neural Information Processing Systems, NIPS, 2016. – year: 2019 ident: b37 article-title: Meta reinforcement learning for sim-to-real domain adaptation – reference: Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, F.F. Li, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in: IEEE International Conference on Robotics and Automation, ICRA, 2017, pp. 3357–3364. – volume: 9 start-page: 3057 year: 2019 ident: b15 article-title: Multi-robot path planning method using reinforcement learning publication-title: Appl. Sci. – volume: 518 start-page: 529 year: 2015 end-page: 533 ident: b6 article-title: Human-level control through deep reinforcement learning publication-title: Nature – start-page: 87 year: 2001 end-page: 94 ident: b22 article-title: Learning to learn using gradient descent publication-title: International Conference on Artificial Neural Networks – reference: J. Schmidhuber, A neural network that embeds its own meta-levels, in: IEEE International Conference on Neural Networks, Vol. 1, March 1993, pp. 407–412. – year: 2019 ident: b3 article-title: Efficient meta reinforcement learning via meta goal generation – reference: . – volume: 42 start-page: 603 year: 2012 end-page: 608 ident: b2 article-title: Elman fuzzy adaptive control for obstacle avoidance of mobile robots using hybrid force/position incorporation publication-title: IEEE Trans. Syst. Man Cybern. C – year: 2021 ident: b11 article-title: Probability dueling DQN active visual SLAM for autonomous navigation in indoor environment publication-title: Ind. Robot – year: 2017 ident: b40 article-title: Proximal policy optimization algorithms – volume: 17 start-page: 1334 year: 2016 end-page: 1373 ident: b21 article-title: End-to-end training of deep visuomotor policies publication-title: J. Mach. Learn. Res. – year: 2018 ident: b16 article-title: On first-order meta-learning algorithms – reference: A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, T. Lillicrap, Meta-learning with memory-augmented neural networks, in: International Conference on Machine Learning, ICML, New York City, NY, USA, 2016, pp. 1842–1850. – volume: 172 start-page: 90 year: 2020 end-page: 99 ident: b39 article-title: Six degree-of-freedom body-fixed hovering over unmapped asteroids via LIDAR altimetry and reinforcement meta-learning publication-title: Acta Astronaut. – start-page: 293 year: 1998 end-page: 309 ident: b25 article-title: Learning to learn publication-title: Ch. Reinforcement Learning with Self-Modifying Policies – start-page: 113 year: 2020 end-page: 118 ident: b4 article-title: Multi-path planning for autonomous navigation of multiple robots in a shared workspace with humans publication-title: 2020 6th International Conference on Control, Automation and Robotics – volume: 2 start-page: 656 year: 2017 end-page: 663 ident: b8 article-title: Deep-learned collision avoidance policy for distributed multiagent navigation publication-title: Robot. Autom. Lett. – reference: S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping and planning for visual navigation: Supplementary material, in: IEEE Conference on Computer Vision & Pattern Recognition, CVPR, 2017, pp. 2616–2625. – start-page: 66 year: 2017 ident: 10.1016/j.asoc.2021.107605_b7 article-title: Cooperative multiagent control using deep reinforcement learning – start-page: 113 year: 2020 ident: 10.1016/j.asoc.2021.107605_b4 article-title: Multi-path planning for autonomous navigation of multiple robots in a shared workspace with humans – volume: 2 start-page: 656 issue: 2 year: 2017 ident: 10.1016/j.asoc.2021.107605_b8 article-title: Deep-learned collision avoidance policy for distributed multiagent navigation publication-title: Robot. Autom. Lett. doi: 10.1109/LRA.2017.2651371 – year: 2021 ident: 10.1016/j.asoc.2021.107605_b11 article-title: Probability dueling DQN active visual SLAM for autonomous navigation in indoor environment publication-title: Ind. Robot doi: 10.1108/IR-08-2020-0160 – year: 2019 ident: 10.1016/j.asoc.2021.107605_b37 – ident: 10.1016/j.asoc.2021.107605_b32 – volume: 42 start-page: 603 issue: 4 year: 2012 ident: 10.1016/j.asoc.2021.107605_b2 article-title: Elman fuzzy adaptive control for obstacle avoidance of mobile robots using hybrid force/position incorporation publication-title: IEEE Trans. Syst. Man Cybern. C doi: 10.1109/TSMCC.2011.2157682 – volume: 10 start-page: 29 issue: 72 year: 2015 ident: 10.1016/j.asoc.2021.107605_b5 article-title: The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments publication-title: Robot. Auton. Syst. doi: 10.1016/j.robot.2015.04.003 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 10.1016/j.asoc.2021.107605_b6 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – start-page: 87 year: 2001 ident: 10.1016/j.asoc.2021.107605_b22 article-title: Learning to learn using gradient descent – ident: 10.1016/j.asoc.2021.107605_b9 doi: 10.1007/978-3-030-01270-0_3 – ident: 10.1016/j.asoc.2021.107605_b28 – volume: 17 start-page: 1334 issue: 39 year: 2016 ident: 10.1016/j.asoc.2021.107605_b21 article-title: End-to-end training of deep visuomotor policies publication-title: J. Mach. Learn. Res. – year: 2019 ident: 10.1016/j.asoc.2021.107605_b36 – start-page: 1 year: 2017 ident: 10.1016/j.asoc.2021.107605_b41 – ident: 10.1016/j.asoc.2021.107605_b24 doi: 10.1109/ICNN.1993.298591 – year: 2019 ident: 10.1016/j.asoc.2021.107605_b3 – start-page: 1126 year: 2017 ident: 10.1016/j.asoc.2021.107605_b17 article-title: Model-agnostic meta-learning for fast adaptation of deep networks – ident: 10.1016/j.asoc.2021.107605_b29 doi: 10.15607/RSS.2018.XIV.002 – ident: 10.1016/j.asoc.2021.107605_b34 doi: 10.1109/CVPR.2019.00691 – ident: 10.1016/j.asoc.2021.107605_b26 doi: 10.1609/aaai.v32i1.11596 – volume: 5 start-page: 2950 issue: 2 year: 2020 ident: 10.1016/j.asoc.2021.107605_b38 article-title: Learning fast adaptation with meta strategy optimization publication-title: IEEE Robt. Autom. Lett. doi: 10.1109/LRA.2020.2974685 – ident: 10.1016/j.asoc.2021.107605_b10 doi: 10.1109/CVPR.2017.769 – ident: 10.1016/j.asoc.2021.107605_b20 – volume: 2 start-page: 56 year: 2014 ident: 10.1016/j.asoc.2021.107605_b1 article-title: Sampling-based robot motion planning: A review publication-title: IEEE Access doi: 10.1109/ACCESS.2014.2302442 – volume: 13 start-page: 263 year: 2020 ident: 10.1016/j.asoc.2021.107605_b14 article-title: Path planning for active SLAM based on deep reinforcement learning under unknown environments publication-title: Intell. Serv. Robot. doi: 10.1007/s11370-019-00310-w – year: 2019 ident: 10.1016/j.asoc.2021.107605_b35 – ident: 10.1016/j.asoc.2021.107605_b12 doi: 10.1109/CVPR.2019.00679 – year: 2018 ident: 10.1016/j.asoc.2021.107605_b31 – ident: 10.1016/j.asoc.2021.107605_b33 – year: 2018 ident: 10.1016/j.asoc.2021.107605_b16 – start-page: 293 year: 1998 ident: 10.1016/j.asoc.2021.107605_b25 article-title: Learning to learn – ident: 10.1016/j.asoc.2021.107605_b27 – volume: 9 start-page: 3057 issue: 15 year: 2019 ident: 10.1016/j.asoc.2021.107605_b15 article-title: Multi-robot path planning method using reinforcement learning publication-title: Appl. Sci. doi: 10.3390/app9153057 – volume: 172 start-page: 90 year: 2020 ident: 10.1016/j.asoc.2021.107605_b39 article-title: Six degree-of-freedom body-fixed hovering over unmapped asteroids via LIDAR altimetry and reinforcement meta-learning publication-title: Acta Astronaut. doi: 10.1016/j.actaastro.2020.03.026 – ident: 10.1016/j.asoc.2021.107605_b19 doi: 10.1109/CVPR.2018.00131 – year: 1987 ident: 10.1016/j.asoc.2021.107605_b18 – year: 2017 ident: 10.1016/j.asoc.2021.107605_b40 – ident: 10.1016/j.asoc.2021.107605_b13 doi: 10.1109/ICRA.2017.7989381 – ident: 10.1016/j.asoc.2021.107605_b23 – year: 2018 ident: 10.1016/j.asoc.2021.107605_b30
SSID	ssj0016928
Score	2.5299084
Snippet	The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	107605
SubjectTerms	Deep reinforcement learning Meta learning Multi-robot system Path planning Transfer learning
Title	A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning
URI	https://dx.doi.org/10.1016/j.asoc.2021.107605
Volume	110
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1872-9681 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0016928 issn: 1568-4946 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] customDbUrl: eissn: 1872-9681 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0016928 issn: 1568-4946 databaseCode: ACRLP dateStart: 20010601 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] customDbUrl: eissn: 1872-9681 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0016928 issn: 1568-4946 databaseCode: AIKHN dateStart: 20010601 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Science Direct customDbUrl: eissn: 1872-9681 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0016928 issn: 1568-4946 databaseCode: .~1 dateStart: 20010601 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1872-9681 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0016928 issn: 1568-4946 databaseCode: AKRWK dateStart: 20010601 isFulltext: true providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA6iFy--xTc5eJO4203abI_LoqxPxAd4K3muirbLUj2Jv92ZNBUF8eCptJ20ZZpm5isz30fIvkxUzxqdMWdzzYT0immnFcNucC2ktjLIt11cZqM7cXqf3s-QYdsLg2WVce1v1vSwWscjnejNzuTxsXMDyKMvcpEBaEHSEmT8FEKiisHhx1eZR5LlQV8VjRlax8aZpsZLgQcAI_YSOCAzlLD7LTh9CzjHS2QhZop00DzMMplx5QpZbFUYaPwoV8n7gIaqQDatdFVTlBhmkyhFRNXzuAL4__BCITml6rXGHgYA-7RUb4Fcoyoplr6P6YurFZu6QKRqwj9DGhUlxhRDnaVgWoc0F-7enlojt8dHt8MRi5oKzPBut2YiVQbc5QBXZMgN721fJ8rnrud83xrjFeAXnnHBDReaSy8tRCwjVQ72vsvXyWxZlW6DUMdTDdcROuFGZBDzlBNpT1ivrNdO6k2StL4sTOQbR9mL56ItLHsq0P8F-r9o_L9JDr7GTBq2jT-t0_YVFT_mTAHh4I9xW_8ct03mca8p5dshs_X01e1CSlLrvTDn9sjcYHh9foXbk7PR5SecceV4
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELagDLDwRrzxwIZMm9qJm7GqqMpzoUjdIj8LCJKqSpkQv52z41QgoQ6s9jmJvji--6K7-xA655FoayUTYnQqCeNWEGmkIK4aXDIuNffybfcPyeCJ3Yzi0RLq1bUwLq0ynP3Vme5P6zDSDGg2Jy8vzUdgHh2WsgRIi2taQpfRCovb3DGwy695nkeUpF5g1VkTZx4qZ6okLwEQAElsRzDAE6dh95d3-uFx-ptoPYSKuFs9zRZaMvk22qhlGHD4KnfQZxf7tEAyLWRRYqcxTCZBiwiLt3EB_P_5HUN0isWsdEUMwPZxLj58d40ixy73fYzfTSnI1PhOqsr_NMRBUmKMna_TGExLH-fC3eupXTTsXw17AxJEFYiirVZJWCwU4GWAWCSuObzVHRkJm5q2sR2tlBVAYGhCGVWUScot1-CyFBcp2NsW3UONvMjNPsKGxhKuw2REFUvA6QkD-DNthbbScHmAohrLTIWG40734i2rM8teM4d_5vDPKvwP0MV8zaRqt7HQOq5fUfZr02TgDxasO_znujO0Ohje32V31w-3R2jNzVR5fceoUU5n5gTik1Ke-v33DdHS5Xg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multi-robot+path-planning+algorithm+for+autonomous+navigation+using+meta-reinforcement+learning+based+on+transfer+learning&rft.jtitle=Applied+soft+computing&rft.au=Wen%2C+Shuhuan&rft.au=Wen%2C+Zeteng&rft.au=Zhang%2C+Di&rft.au=Zhang%2C+Hong&rft.date=2021-10-01&rft.pub=Elsevier+B.V&rft.issn=1568-4946&rft.eissn=1872-9681&rft.volume=110&rft_id=info:doi/10.1016%2Fj.asoc.2021.107605&rft.externalDocID=S1568494621005263
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1568-4946&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1568-4946&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1568-4946&client=summon