Distributed and Distribution-Robust Meta Reinforcement Learning (D ^-RMRL) for Data Pre-Storage and Routing in Cube Satellite Networks

In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of selected topics in signal processing Vol. 17; no. 1; pp. 128 - 141
Main Authors	Hu, Ye, Wang, Xiaodong, Saad, Walid
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Actor-critic Algorithms cube satellite network data pre-storage Decomposition Heuristic algorithms Logic gates Machine learning Markov processes meta learning multi-agent reinforcement learning Optimization Robustness Routing Satellite communication Satellite networks Satellites Task analysis Training value decomposition
Online Access	Get full text
ISSN	1932-4553 1941-0484
DOI	10.1109/JSTSP.2022.3232944

Cover

Abstract	In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storage and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula>-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure. Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a 31.8% improvement of the pre-store hits and a 40.7% improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to 43.7%.
AbstractList	In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storage and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D[Formula Omitted]-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure. Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a 31.8% improvement of the pre-store hits and a 40.7% improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to 43.7%. In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storage and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula>-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure. Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a 31.8% improvement of the pre-store hits and a 40.7% improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to 43.7%.
Author	Wang, Xiaodong Saad, Walid Hu, Ye
Author_xml	– sequence: 1 givenname: Ye orcidid: 0000-0001-9872-5461 surname: Hu fullname: Hu, Ye email: yh3453@columbia.edu organization: Department of Electrical Engineering, Columbia University, New York, NY, USA – sequence: 2 givenname: Xiaodong orcidid: 0000-0002-2945-9240 surname: Wang fullname: Wang, Xiaodong email: xw2008@columbia.edu organization: Department of Electrical Engineering, Columbia University, New York, NY, USA – sequence: 3 givenname: Walid orcidid: 0000-0003-2247-2458 surname: Saad fullname: Saad, Walid email: walids@vt.edu organization: Wireless@VT, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA
BookMark	eNp9kMtKAzEUhoMoqNUXEBcBN7qYmtvcltJ6pWqZ6tYhkzkjqW1SkwziC_jcTi-IuHB1zuH8F_j20baxBhA6oqRPKcnP7yZPk3GfEcb6nHGWC7GF9mguaEREJraXO2eRiGO-i_a9nxISpwkVe-hrqH1wumoD1FiaGv_c2pqosFXrA76HIHEB2jTWKZiDCXgE0hltXvHpEL9ExX0xOsPdFw9lJx07iCbBOvkKq8zCdnGdVhs8aCvAExlgNtMB8AOED-ve_AHaaeTMw-Fm9tDz1eXT4CYaPV7fDi5GkWJ5EqKKc0XjFGoGNG6gzphkVOUkq1nKE1FliteJEI1QtcwoUTJhgsuY1pAInquG99DJOnfh7HsLPpRT2zrTVZYsTbOUCMp5p2JrlXLWewdNuXB6Lt1nSUm55F2ueJdL3uWGd2fK_piUDnKJMTipZ_9bj9dWDQC_uggRueD8G1n8kM4
CODEN	IJSTGY
CitedBy_id	crossref_primary_10_1109_JSAC_2024_3460086 crossref_primary_10_1109_TAES_2024_3438681
Cites_doi	10.1109/ICC.2012.6363993 10.1109/TWC.2020.3024629 10.1109/JSAC.2021.3118346 10.1109/JIOT.2021.3065664 10.1109/TCOMM.2017.2685383 10.1109/MWC.001.1900178 10.1109/JSAC.2017.2680898 10.1109/TGCN.2019.2954166 10.1109/MCOM.2019.1800796 10.1609/aaai.v32i1.11794 10.1007/BF00992698 10.1017/CBO9780511807213 10.1155/2018/3026405 10.1109/tnn.1998.712192 10.1016/j.comnet.2020.107213 10.1109/JSAC.2021.3088689 10.1002/9781119673811 10.1002/sat.1374 10.1002/ett.3861 10.1109/JSAC.2018.2832798 10.21236/ADA280862 10.1109/MWC.2017.1600173 10.1109/JSAC.2003.819970
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID	97E RIA RIE AAYXX CITATION 7SP 8FD H8D L7M
DOI	10.1109/JSTSP.2022.3232944
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Aerospace Database Advanced Technologies Database with Aerospace
DatabaseTitle	CrossRef Aerospace Database Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts
DatabaseTitleList	Aerospace Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1941-0484
EndPage	141
ExternalDocumentID	10_1109_JSTSP_2022_3232944 10004943
Genre	orig-research
GrantInformation_xml	– fundername: U.S. National Science Foundation grantid: CNS-1909372
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RNS AAYXX CITATION 7SP 8FD H8D L7M
ID	FETCH-LOGICAL-c296t-b33c157ed2e15fed82a21c908d27364b8c3d644f4cda810ca6243a51de6439cf3
IEDL.DBID	RIE
ISSN	1932-4553
IngestDate	Mon Jun 30 10:18:31 EDT 2025 Wed Oct 01 03:34:40 EDT 2025 Thu Apr 24 23:03:25 EDT 2025 Wed Aug 27 02:54:06 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c296t-b33c157ed2e15fed82a21c908d27364b8c3d644f4cda810ca6243a51de6439cf3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-9872-5461 0000-0002-2945-9240 0000-0003-2247-2458
PQID	2778704133
PQPubID	75721
PageCount	14
ParticipantIDs	crossref_citationtrail_10_1109_JSTSP_2022_3232944 crossref_primary_10_1109_JSTSP_2022_3232944 ieee_primary_10004943 proquest_journals_2778704133
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-Jan. 2023-1-00 20230101
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– month: 01 year: 2023 text: 2023-Jan.
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE journal of selected topics in signal processing
PublicationTitleAbbrev	JSTSP
PublicationYear	2023
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref15 ref14 ref11 ref10 ref32 ref2 ref1 (ref31) 2022 ref16 ref19 Puterman (ref21) 2014 Sutton (ref24) 2000 Nichol (ref18) 2018 ref23 ref20 Finn (ref26) 2017 ref28 Sunehag (ref25) 2017 ref29 ref8 Vanschoren (ref17) 2018 ref7 (ref30) 2022 ref9 ref4 ref3 ref6 Watkins (ref22) 1992; 8 ref5 Erhan (ref27) 2010
References_xml	– ident: ref7 doi: 10.1109/ICC.2012.6363993 – ident: ref23 doi: 10.1109/TWC.2020.3024629 – ident: ref16 doi: 10.1109/JSAC.2021.3118346 – ident: ref6 doi: 10.1109/JIOT.2021.3065664 – ident: ref4 doi: 10.1109/TCOMM.2017.2685383 – ident: ref14 doi: 10.1109/MWC.001.1900178 – ident: ref19 doi: 10.1109/JSAC.2017.2680898 – ident: ref3 doi: 10.1109/TGCN.2019.2954166 – ident: ref11 doi: 10.1109/MCOM.2019.1800796 – start-page: 201 volume-title: Proc. 13th Int. Conf. Artif. Intell. Statist.. Workshop year: 2010 ident: ref27 article-title: Why does unsupervised pre-training help deep learning – ident: ref32 doi: 10.1609/aaai.v32i1.11794 – volume-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming year: 2014 ident: ref21 – volume: 8 start-page: 279 issue: 34 year: 1992 ident: ref22 article-title: Q-learning publication-title: Mach. Learn. doi: 10.1007/BF00992698 – ident: ref20 doi: 10.1017/CBO9780511807213 – year: 2022 ident: ref31 article-title: Starlink daily coverage estimates – ident: ref13 doi: 10.1155/2018/3026405 – ident: ref29 doi: 10.1109/tnn.1998.712192 – ident: ref12 doi: 10.1016/j.comnet.2020.107213 – ident: ref15 doi: 10.1109/JSAC.2021.3088689 – ident: ref1 doi: 10.1002/9781119673811 – ident: ref5 doi: 10.1002/sat.1374 – ident: ref9 doi: 10.1002/ett.3861 – year: 2018 ident: ref17 article-title: Meta-learning: A survey – year: 2018 ident: ref18 article-title: On first-order meta-learning algorithms – ident: ref10 doi: 10.1109/JSAC.2018.2832798 – year: 2017 ident: ref25 article-title: Value-decomposition networks for cooperative multi-agent learning – ident: ref28 doi: 10.21236/ADA280862 – year: 2022 ident: ref30 article-title: Starlink coverage tracker – volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2000 ident: ref24 article-title: Policy gradient methods for reinforcement learning with function approximation – ident: ref2 doi: 10.1109/MWC.2017.1600173 – ident: ref8 doi: 10.1109/JSAC.2003.819970 – start-page: 1126 volume-title: Proc. 34th Int. Conf. Mach. Learn. year: 2017 ident: ref26 article-title: Model-agnostic meta-learning for fast adaptation of deep networks
SSID	ssj0057614
Score	2.409784
Snippet	In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	128
SubjectTerms	Actor-critic Algorithms cube satellite network data pre-storage Decomposition Heuristic algorithms Logic gates Machine learning Markov processes meta learning multi-agent reinforcement learning Optimization Robustness Routing Satellite communication Satellite networks Satellites Task analysis Training value decomposition
Title	Distributed and Distribution-Robust Meta Reinforcement Learning (D ^-RMRL) for Data Pre-Storage and Routing in Cube Satellite Networks
URI	https://ieeexplore.ieee.org/document/10004943 https://www.proquest.com/docview/2778704133
Volume	17
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Xplore Digital Library customDbUrl: eissn: 1941-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0057614 issn: 1932-4553 databaseCode: RIE dateStart: 20070101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA7qSQ--xdVVcvCgSNY2SdPmKK4ioovsKniypMlUROnKbnvxB_i7TdJWfKB4a-mkBGYy8yWZ-QahPSEzgIApIlWuCZc8IirOBREi19JGGMWNz7YYiPNbfnEX3TXF6r4WBgB88hn03KO_yzdjXbmjsqOwpjNhs2g2TkRdrNW6XYubw-YKmRIeRaytkAnkkbXx0bXdC1LaYxZBSM6_RCHfVuWHL_YB5mwJDdqp1XklT72qzHr69Rtr47_nvowWG6iJj2vbWEEzUKyihU8EhGvore94c13LKzBYFQZ_vFt1keE4q6YlvoJS4SF4ilXtTxNxw8r6gPf7-J4Mr4aXB9h-xX1lRa8nQEZ2K289lf-nSzpyso8FPqkywCPlWUBLwIM6B326jm7PTm9OzknTmYFoKkVJMsZ0GMVgKIRRDiahioZaBomxaEjwLNHMWKCVc21UEgZaCcqZikIDDgDpnG2guWJcwCbCSWCkA0FMUeA6DyQ1UaZ5oIwjapSyg8JWU6luaMtd94zn1G9fApl67aZOu2mj3Q46_BjzUpN2_Cm97tT1SbLWVAd1W4tIm4U9TWnsPJyN_Gzrl2HbaN61pK-PabporpxUsGOBS5nteoN9B9uD6Ow
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB619ND2AC0PsS0UH3qgqrwkfmTjY8WCFthdoV2QOBE59qSqqLIVm1z4AfxubCdB0KpVb4kyjizNeOazPfMNwOdE5YgR11TpwlChhKR6UCQ0SQqjXITRwoZsi2kyuhSnV_KqLVYPtTCIGJLPsO8fw12-XZjaH5UdxA2dCX8Jr6QQQjblWp3jdcg5bi-RGRVS8q5GJlIHzsrn5243yFifOwyhhHgWh0JjlT-8cQgxx2sw7SbXZJbc9Osq75u733gb_3v272C1BZvkW2Md7-EFluvw9gkF4QbcDz1zrm96hZbo0pLHd6cwOlvk9bIiE6w0mWEgWTXhPJG0vKzfyf6QXNPZZDb-QtxXMtRO9PwW6dxt5p2vCv_0aUde9kdJDuscyVwHHtAKybTJQl9uwuXx0cXhiLa9GahhKqlozrmJ5QAtw1gWaFOmWWxUlFqHhxKRp4ZbB7UKYaxO48johAmuZWzRQyBT8C1YKRclbgNJI6s8DOKaoTBFpJiVuRGRtp6qUakexJ2mMtMSl_v-GT-zsIGJVBa0m3ntZq12e_D1ccyvhrbjn9KbXl1PJBtN9WCns4isXdrLjA28j3Oxn3_4y7A9eD26mIyz8cn07CO88Q3qm0ObHVipbmvcdTCmyj8F430ANGbsOQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+and+Distribution-Robust+Meta+Reinforcement+Learning+%28D%24%5E%7B2%7D%24-RMRL%29+for+Data+Pre-Storage+and+Routing+in+Cube+Satellite+Networks&rft.jtitle=IEEE+journal+of+selected+topics+in+signal+processing&rft.au=Hu%2C+Ye&rft.au=Wang%2C+Xiaodong&rft.au=Saad%2C+Walid&rft.date=2023-01-01&rft.issn=1932-4553&rft.eissn=1941-0484&rft.volume=17&rft.issue=1&rft.spage=128&rft.epage=141&rft_id=info:doi/10.1109%2FJSTSP.2022.3232944&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_JSTSP_2022_3232944
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-4553&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-4553&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-4553&client=summon