Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best r...

Full description

Saved in:

Bibliographic Details
Published in	Dynamic games and applications Vol. 13; no. 1; pp. 56 - 88
Main Authors	Subramanian, Jayakumar, Sinha, Amit, Mahajan, Aditya
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2023 Springer Nature B.V
Subjects	Algorithms Approximation Communications Engineering Complexity Computer Systems Organization and Communication Networks Economic Theory/Quantitative Economics/Mathematical Methods Economics Equilibrium Game Theory Games Management Science Markov analysis Mathematics Mathematics and Statistics Multi-agent Dynamic Decision Making and Learning Multiagent systems Networks Operations Research Robustness Social and Behav. Sciences
Online Access	Get full text
ISSN	2153-0785 2153-0793
DOI	10.1007/s13235-023-00490-2

Cover

Abstract	Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that O ~ ( ( 1 - γ ) - 4 α - 2 ) samples per state–action pair are sufficient to obtain a α -approximate Markov perfect equilibrium with high probability, where γ is the discount factor, and the O ~ ( · ) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that O ~ ( ( 1 - γ ) - 1 α - 2 ) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.
AbstractList	Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that O ~ ( ( 1 - γ ) - 4 α - 2 ) samples per state–action pair are sufficient to obtain a α -approximate Markov perfect equilibrium with high probability, where γ is the discount factor, and the O ~ ( · ) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that O ~ ( ( 1 - γ ) - 1 α - 2 ) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example. Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that O~((1-γ)-4α-2) samples per state–action pair are sufficient to obtain a α-approximate Markov perfect equilibrium with high probability, where γ is the discount factor, and the O~(·) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that O~((1-γ)-1α-2) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.
Author	Subramanian, Jayakumar Mahajan, Aditya Sinha, Amit
Author_xml	– sequence: 1 givenname: Jayakumar orcidid: 0000-0003-4621-2677 surname: Subramanian fullname: Subramanian, Jayakumar email: jasubram@adobe.com organization: Media and Data Science Research Lab, Digital Experience Cloud, Adobe Inc – sequence: 2 givenname: Amit surname: Sinha fullname: Sinha, Amit organization: Department of Electrical and Computer Engineering, McGill University – sequence: 3 givenname: Aditya orcidid: 0000-0001-8125-1191 surname: Mahajan fullname: Mahajan, Aditya organization: Department of Electrical and Computer Engineering, McGill University
BookMark	eNp9kE9LAzEQxYMoWGu_gKeA59X82TS7x1q0Ci1iq-eQzU5k625Sk63Yb-_WFQUPncubgfebGd4ZOnbeAUIXlFxRQuR1pJxxkRDGE0LSnCTsCA0YFd0oc37822fiFI1iXJOu0jEdCzlAT0tfbGPrIEasXYlXutnUgKd-L59Vu8Pe4oUvoU5udIQSLybLObY-4Bk4CLpOVtsGL3R48x94phuI5-jE6jrC6EeH6OXu9nl6n8wfZw_TyTwxKcnbZExkKSQ1EqTNtLVCWpNRKLjJeUYLKqwR1AidEpLTQoDpnIRmwCwTkGnOh-iy37sJ_n0LsVVrvw2uO6k4E0JQKnLaubLeZYKPMYBVpmp1W3nXBl3VihK1z1D1GaouQ_WdoWIdyv6hm1A1OuwOQ7yHYmd2rxD-vjpAfQHjVoRm
CitedBy_id	crossref_primary_10_1007_s13235_023_00493_z
Cites_doi	10.2307/2297841 10.2307/1428011 10.1111/j.1467-937X.2008.00496.x 10.32917/hmj/1206139509 10.1287/moor.22.4.872 10.1257/aer.91.4.938 10.1007/BF01594936 10.1287/mnsc.12.5.359 10.3982/TE632 10.1016/S1389-0417(01)00015-8 10.1007/978-3-319-44374-4 10.1137/S0363012994272460 10.1111/j.1756-2171.2007.tb00073.x 10.1137/0318003 10.2307/2601038 10.2307/1426772 10.1016/j.jmaa.2014.05.061 10.1111/j.1468-0262.2007.00731.x 10.1073/pnas.39.10.1095 10.1137/S036301299325534X 10.32917/hmj/1206139508 10.1017/CBO9780511546921 10.1006/jeth.2000.2785 10.1007/s00199-009-0441-5 10.1007/978-94-011-3760-7_5 10.1007/978-0-8176-4757-5 10.1007/s10994-013-5368-1 10.1109/TSMCC.2007.913919 10.1111/j.1468-0262.2007.00796.x 10.1093/acprof:oso/9780195300796.001.0001 10.1007/978-3-030-32430-8_29 10.1016/B978-1-55860-335-6.50027-1 10.1016/B978-1-55860-141-3.50030-4 10.24963/ijcai.2021/466 10.1007/s00186-005-0438-1 10.2307/1911701 10.1007/978-3-030-60990-0_12 10.1007/978-1-4612-4054-9 10.1093/nsr/nwac256 10.2307/1911700
ContentType	Journal Article
Copyright	The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
Copyright_xml	– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
DBID	AAYXX CITATION 3V. 7WY 7WZ 7XB 87Z 8FE 8FG 8FK 8FL ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG F~G GNUQQ HCIFZ JQ2 K60 K6~ K7- L.- M0C P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI PRINS PYYUZ Q9U
DOI	10.1007/s13235-023-00490-2
DatabaseName	CrossRef ProQuest Central (Corporate) ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials - QC ProQuest Central Business Premium Collection Technology Collection (ProQuest) ProQuest One Community College ProQuest Central Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database ABI/INFORM Professional Advanced ABI/INFORM Global Advanced Technologies & Aerospace Collection ProQuest Advanced Technologies & Aerospace Collection Proquest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ABI/INFORM Collection China ProQuest Central Basic
DatabaseTitle	CrossRef ABI/INFORM Global (Corporate) ProQuest Business Collection (Alumni Edition) ProQuest One Business Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ABI/INFORM Complete ProQuest Central ABI/INFORM Professional Advanced ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Central (New) ABI/INFORM Complete (Alumni Edition) Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest One Academic Eastern Edition ABI/INFORM China ProQuest Technology Collection ProQuest SciTech Collection ProQuest Business Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Business (Alumni) ProQuest One Academic ProQuest Central (Alumni) ProQuest One Academic (New) Business Premium Collection (Alumni)
DatabaseTitleList	ABI/INFORM Global (Corporate)
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Economics Mathematics
EISSN	2153-0793
EndPage	88
ExternalDocumentID	10_1007_s13235_023_00490_2
GrantInformation_xml	– fundername: Canadian Department of National Defence grantid: CFPMN2-30
GroupedDBID	-EM 0R~ 0VY 203 2VQ 30V 4.4 406 408 409 96X AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH AAZMS ABAKF ABBXA ABDZT ABECU ABFTD ABFTV ABJNI ABJOX ABKCH ABMQK ABQBU ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABXPI ACAOD ACDTI ACGFS ACHSB ACKNC ACMLO ACOKC ACPIV ACZOJ ADHHG ADHIR ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFQL AEGNC AEJHL AEJRE AEMSY AENEX AEOHA AEPYU AESKC AETCA AEVLU AEXYK AFBBN AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGMZJ AGQEE AGQMX AGRTI AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ AKLTO ALFXC ALMA_UNASSIGNED_HOLDINGS AMKLP AMXSW AMYLF AMYQR ANMIH AUKKA AXYYD AYJHY BAPOH BGNMA CSCUP DNIVK DPUIP EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FIGPU FINBP FNLPD FRRFC FSGXE FYJPI GGCAI GGRSB GJIRD GQ6 GQ8 HMJXF HQYDN HRMNR HVGLF HZ~ I0C IKXTQ IWAJR IXD IZIGR J-C J9A JBSCW JCJTX JZLTJ KOV LLZTM M4Y NPVJJ NQJWS NU0 O9- O93 O9J PQQKQ PT4 RLLFE ROL RSV S1Z S27 SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE T13 TSG U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W48 WK8 Z7R Z81 ZMTXR ~A9 AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC AEZWR AFDZB AFHIU AFOHR AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION 3V. 7WY 7XB 8FE 8FG 8FK 8FL ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG GNUQQ HCIFZ JQ2 K60 K6~ K7- L.- M0C P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQUKI PRINS Q9U
ID	FETCH-LOGICAL-c409t-607d571c7e7f8aff57fc81eb3c9381b15fc51c5a40091b5ecc7e018e2f25e8a33
IEDL.DBID	BENPR
ISSN	2153-0785
IngestDate	Tue Sep 30 03:21:51 EDT 2025 Wed Oct 01 04:15:31 EDT 2025 Thu Apr 24 23:07:03 EDT 2025 Fri Feb 21 02:45:59 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c409t-607d571c7e7f8aff57fc81eb3c9381b15fc51c5a40091b5ecc7e018e2f25e8a33
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-8125-1191 0000-0003-4621-2677
PQID	3255511591
PQPubID	2043993
PageCount	33
ParticipantIDs	proquest_journals_3255511591 crossref_citationtrail_10_1007_s13235_023_00490_2 crossref_primary_10_1007_s13235_023_00490_2 springer_journals_10_1007_s13235_023_00490_2
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-03-01
PublicationDateYYYYMMDD	2023-03-01
PublicationDate_xml	– month: 03 year: 2023 text: 2023-03-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Heidelberg
PublicationTitle	Dynamic games and applications
PublicationTitleAbbrev	Dyn Games Appl
PublicationYear	2023
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	Fink (CR21) 1964; 28 Ericson, Pakes (CR17) 1995; 62 Kearns, Singh (CR28) 1999; 871 Subramanian, Sinha, Seraj, Mahajan (CR53) 2022; 23 CR36 Takahashi (CR55) 1964; 28 CR35 CR32 CR30 Littman (CR33) 2001; 2 Müller (CR39) 1997; 29 Li, Wei, Chi, Gu, Chen (CR31) 2020; 33 Altman (CR6) 1999 Hoffman, Karp (CR25) 1966; 12 Müller (CR38) 1997; 22 Albright, Winston (CR5) 1979; 11 Fershtiman, Pakes (CR18) 2000; 31 Maskin, Tirole (CR37) 2001; 100 CR2 Whitt (CR60) 1980; 18 CR4 Busoniu, Babuska, De Schutter (CR13) 2008; 38 Jaśkiewicz, Nowak (CR26) 2014; 419 Başar, Bernhard (CR9) 2008 CR49 CR48 CR47 CR45 CR44 CR43 CR41 Pesendorfer, Schmidt-Dengler (CR42) 2008; 75 Shapley (CR46) 1953; 39 Filar, Schultz, Thuijsman, Vrieze (CR20) 1991; 50 Pakes, Ostrovsky, Berry (CR40) 2007; 38 CR19 Breton (CR12) 1991 CR15 CR59 Tidball, Altman (CR56) 1996; 34 CR58 Acemoglu, Robinson (CR1) 2001; 91 Bajari, Benkard, Levin (CR8) 2007; 75 Bertsekas (CR11) 2017 CR54 CR52 Aguirregabiria, Mira (CR3) 2007; 75 CR51 Doraszelski, Escobar (CR16) 2010; 5 Herings, Peeters (CR22) 2010; 42 Cesa-Bianch, Lugosi (CR14) 2006 CR29 CR27 Mailath, Samuelson (CR34) 2006 Solan (CR50) 2021 Başar, Zaccour (CR10) 2018 CR24 Tidball, Pourtallier, Altman (CR57) 1997; 35 Azar, Munos, Kappen (CR7) 2013; 91 CR65 CR64 Zhang, Kakade, Basar, Yang (CR61) 2020; 33 CR63 CR62 Herings, Peeters (CR23) 2004; 118 490_CR54 A Pakes (490_CR40) 2007; 38 490_CR52 490_CR15 AM Fink (490_CR21) 1964; 28 490_CR59 E Altman (490_CR6) 1999 490_CR58 490_CR19 A Jaśkiewicz (490_CR26) 2014; 419 G Li (490_CR31) 2020; 33 L Busoniu (490_CR13) 2008; 38 LS Shapley (490_CR46) 1953; 39 U Doraszelski (490_CR16) 2010; 5 M Breton (490_CR12) 1991 PJ-J Herings (490_CR23) 2004; 118 MG Azar (490_CR7) 2013; 91 SC Albright (490_CR5) 1979; 11 V Aguirregabiria (490_CR3) 2007; 75 R Ericson (490_CR17) 1995; 62 490_CR62 MM Tidball (490_CR57) 1997; 35 490_CR65 490_CR64 490_CR63 M Pesendorfer (490_CR42) 2008; 75 E Maskin (490_CR37) 2001; 100 A Müller (490_CR39) 1997; 29 J Subramanian (490_CR53) 2022; 23 490_CR24 K Zhang (490_CR61) 2020; 33 490_CR29 490_CR27 D Acemoglu (490_CR1) 2001; 91 ML Littman (490_CR33) 2001; 2 T Başar (490_CR9) 2008 GJ Mailath (490_CR34) 2006 PJ-J Herings (490_CR22) 2010; 42 490_CR32 490_CR30 490_CR36 490_CR35 A Müller (490_CR38) 1997; 22 M Takahashi (490_CR55) 1964; 28 C Fershtiman (490_CR18) 2000; 31 MM Tidball (490_CR56) 1996; 34 P Bajari (490_CR8) 2007; 75 DP Bertsekas (490_CR11) 2017 M Kearns (490_CR28) 1999; 871 T Başar (490_CR10) 2018 490_CR44 490_CR43 490_CR41 490_CR48 490_CR47 W Whitt (490_CR60) 1980; 18 490_CR45 490_CR4 AJ Hoffman (490_CR25) 1966; 12 490_CR49 N Cesa-Bianch (490_CR14) 2006 E Solan (490_CR50) 2021 490_CR2 JA Filar (490_CR20) 1991; 50 490_CR51
References_xml	– ident: CR45 – volume: 62 start-page: 53 issue: 1 year: 1995 end-page: 82 ident: CR17 article-title: Markov-perfect industry dynamics: a framework for empirical work publication-title: Rev Econ Stud doi: 10.2307/2297841 – volume: 29 start-page: 429 issue: 2 year: 1997 end-page: 443 ident: CR39 article-title: Integral probability metrics and their generating classes of functions publication-title: Adv Appl Probab doi: 10.2307/1428011 – volume: 75 start-page: 901 issue: 3 year: 2008 end-page: 928 ident: CR42 article-title: Asymptotic least squares estimators for dynamic games publication-title: Rev Econ Stud doi: 10.1111/j.1467-937X.2008.00496.x – ident: CR49 – volume: 28 start-page: 95 issue: 1 year: 1964 ident: CR55 article-title: Equilibrium points of stochastic non-cooperative -person games publication-title: Hiroshima Math J doi: 10.32917/hmj/1206139509 – ident: CR4 – volume: 22 start-page: 872 issue: 4 year: 1997 end-page: 885 ident: CR38 article-title: How does the value function of a Markov decision process depend on the transition probabilities? publication-title: Math Op Res doi: 10.1287/moor.22.4.872 – ident: CR51 – volume: 91 start-page: 938 issue: 4 year: 2001 end-page: 963 ident: CR1 article-title: A theory of political transitions publication-title: Am Econ Rev doi: 10.1257/aer.91.4.938 – volume: 50 start-page: 227 issue: 1 year: 1991 end-page: 237 ident: CR20 article-title: Nonlinear programming and stationary equilibria in stochastic games publication-title: Math Program doi: 10.1007/BF01594936 – volume: 12 start-page: 359 issue: 5 year: 1966 end-page: 370 ident: CR25 article-title: On nonterminating stochastic games publication-title: Manage Sci doi: 10.1287/mnsc.12.5.359 – volume: 5 start-page: 369 issue: 3 year: 2010 end-page: 402 ident: CR16 article-title: A theory of regular Markov perfect equilibria in dynamic stochastic games: Genericity, stability, and purification publication-title: Theor Econ doi: 10.3982/TE632 – volume: 2 start-page: 55 issue: 1 year: 2001 end-page: 66 ident: CR33 article-title: Value-function reinforcement learning in Markov games publication-title: Cognit Syst Res doi: 10.1016/S1389-0417(01)00015-8 – ident: CR35 – year: 2018 ident: CR10 publication-title: Handbook of dynamic game theory doi: 10.1007/978-3-319-44374-4 – ident: CR29 – ident: CR54 – ident: CR58 – volume: 35 start-page: 2101 issue: 6 year: 1997 end-page: 2117 ident: CR57 article-title: Approximations in dynamic zero-sum games II publication-title: SIAM J Control Optim doi: 10.1137/S0363012994272460 – volume: 38 start-page: 373 issue: 2 year: 2007 end-page: 399 ident: CR40 article-title: Simple estimators for the parameters of discrete dynamic games (with entry/exit examples) publication-title: RAND J Econ doi: 10.1111/j.1756-2171.2007.tb00073.x – volume: 18 start-page: 33 issue: 1 year: 1980 end-page: 48 ident: CR60 article-title: Representation and approximation of noncooperative sequential games publication-title: SIAM J Control Optim doi: 10.1137/0318003 – volume: 118 start-page: 32 issue: 1 year: 2004 end-page: 60 ident: CR23 article-title: Stationary equilibria in stochastic games: structure, selection, and computation publication-title: J Econ Theory – volume: 31 start-page: 207 issue: 2 year: 2000 end-page: 236 ident: CR18 article-title: A dynamic oligopoly with collusion and price wars publication-title: RAND J Econ doi: 10.2307/2601038 – ident: CR19 – volume: 11 start-page: 134 issue: 1 year: 1979 end-page: 152 ident: CR5 article-title: A birth-death model of advertising and pricing publication-title: Adv Appl Probab doi: 10.2307/1426772 – volume: 419 start-page: 1322 issue: 2 year: 2014 end-page: 1332 ident: CR26 article-title: Robust Markov perfect equilibria publication-title: J Math Anal Appl doi: 10.1016/j.jmaa.2014.05.061 – volume: 75 start-page: 1 issue: 1 year: 2007 end-page: 53 ident: CR3 article-title: Sequential estimation of dynamic discrete games publication-title: Econometrica doi: 10.1111/j.1468-0262.2007.00731.x – ident: CR15 – year: 2021 ident: CR50 publication-title: A course in stochastic game theory – volume: 39 start-page: 1095 issue: 10 year: 1953 end-page: 1100 ident: CR46 article-title: Stochastic games publication-title: Proc Nat Acad Sci doi: 10.1073/pnas.39.10.1095 – ident: CR32 – volume: 34 start-page: 311 issue: 1 year: 1996 end-page: 328 ident: CR56 article-title: Approximations in dynamic zero-sum games I publication-title: SIAM J Control Optim doi: 10.1137/S036301299325534X – year: 1999 ident: CR6 publication-title: Constrained Markov decision processes: stochastic modeling – ident: CR36 – ident: CR64 – volume: 28 start-page: 1 year: 1964 ident: CR21 article-title: Equilibrium in a stochastic -person game publication-title: Hiroshima Math J doi: 10.32917/hmj/1206139508 – volume: 23 start-page: 1 year: 2022 end-page: 12 ident: CR53 article-title: Approximate information state for approximate planning and reinforcement learning in partially observed systems publication-title: J Mach Learn Res – year: 2006 ident: CR14 publication-title: Prediction, learning, and games doi: 10.1017/CBO9780511546921 – volume: 33 start-page: 12861 year: 2020 ident: CR31 article-title: Breaking the sample size barrier in model-based reinforcement learning with a generative model publication-title: Adv Neural Inf Process Syst – volume: 871 start-page: 996 year: 1999 end-page: 1002 ident: CR28 article-title: Finite-sample convergence rates for q-learning and indirect algorithms publication-title: Adv Neural Inf Process Syst – volume: 100 start-page: 191 issue: 2 year: 2001 end-page: 219 ident: CR37 article-title: Markov perfect equilibrium: I. observable actions publication-title: J Econ Theory doi: 10.1006/jeth.2000.2785 – volume: 42 start-page: 119 issue: 1 year: 2010 end-page: 156 ident: CR22 article-title: Homotopy methods to compute equilibria in game theory publication-title: Econ Theory doi: 10.1007/s00199-009-0441-5 – ident: CR43 – ident: CR47 – volume: 33 start-page: 1166 year: 2020 ident: CR61 article-title: Model-based multi-agent rl in zero-sum Markov games with near-optimal sample complexity publication-title: Adv Neural Inf Process Syst – year: 1991 ident: CR12 publication-title: Algorithms for stochastic games doi: 10.1007/978-94-011-3760-7_5 – ident: CR2 – ident: CR30 – year: 2008 ident: CR9 publication-title: H-infinity optimal control and related minimax design problems: a dynamic game approach doi: 10.1007/978-0-8176-4757-5 – volume: 91 start-page: 325 issue: 3 year: 2013 end-page: 349 ident: CR7 article-title: Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model publication-title: Mach Learn doi: 10.1007/s10994-013-5368-1 – volume: 38 start-page: 156 issue: 2 year: 2008 end-page: 172 ident: CR13 article-title: A comprehensive survey of multiagent reinforcement learning publication-title: IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) doi: 10.1109/TSMCC.2007.913919 – ident: CR63 – ident: CR27 – year: 2017 ident: CR11 publication-title: Dynamic programming and optimal control – ident: CR44 – ident: CR48 – volume: 75 start-page: 1331 issue: 5 year: 2007 end-page: 1370 ident: CR8 article-title: Estimating dynamic models of imperfect competition publication-title: Econometrica doi: 10.1111/j.1468-0262.2007.00796.x – ident: CR65 – ident: CR52 – year: 2006 ident: CR34 publication-title: Repeated games and reputations: long-run relationships doi: 10.1093/acprof:oso/9780195300796.001.0001 – ident: CR59 – ident: CR41 – ident: CR62 – ident: CR24 – ident: 490_CR4 – volume-title: A course in stochastic game theory year: 2021 ident: 490_CR50 – volume: 31 start-page: 207 issue: 2 year: 2000 ident: 490_CR18 publication-title: RAND J Econ doi: 10.2307/2601038 – volume: 33 start-page: 1166 year: 2020 ident: 490_CR61 publication-title: Adv Neural Inf Process Syst – ident: 490_CR29 – ident: 490_CR48 – volume: 11 start-page: 134 issue: 1 year: 1979 ident: 490_CR5 publication-title: Adv Appl Probab doi: 10.2307/1426772 – volume: 91 start-page: 938 issue: 4 year: 2001 ident: 490_CR1 publication-title: Am Econ Rev doi: 10.1257/aer.91.4.938 – volume: 118 start-page: 32 issue: 1 year: 2004 ident: 490_CR23 publication-title: J Econ Theory – volume-title: Repeated games and reputations: long-run relationships year: 2006 ident: 490_CR34 doi: 10.1093/acprof:oso/9780195300796.001.0001 – ident: 490_CR44 – volume-title: H-infinity optimal control and related minimax design problems: a dynamic game approach year: 2008 ident: 490_CR9 doi: 10.1007/978-0-8176-4757-5 – volume: 871 start-page: 996 year: 1999 ident: 490_CR28 publication-title: Adv Neural Inf Process Syst – ident: 490_CR41 – volume: 75 start-page: 1 issue: 1 year: 2007 ident: 490_CR3 publication-title: Econometrica doi: 10.1111/j.1468-0262.2007.00731.x – ident: 490_CR58 – volume: 100 start-page: 191 issue: 2 year: 2001 ident: 490_CR37 publication-title: J Econ Theory doi: 10.1006/jeth.2000.2785 – volume: 12 start-page: 359 issue: 5 year: 1966 ident: 490_CR25 publication-title: Manage Sci doi: 10.1287/mnsc.12.5.359 – ident: 490_CR45 doi: 10.1007/978-3-030-32430-8_29 – volume: 29 start-page: 429 issue: 2 year: 1997 ident: 490_CR39 publication-title: Adv Appl Probab doi: 10.2307/1428011 – ident: 490_CR32 doi: 10.1016/B978-1-55860-335-6.50027-1 – ident: 490_CR51 – ident: 490_CR54 doi: 10.1016/B978-1-55860-141-3.50030-4 – ident: 490_CR49 – ident: 490_CR64 doi: 10.24963/ijcai.2021/466 – ident: 490_CR24 doi: 10.1007/s00186-005-0438-1 – volume-title: Dynamic programming and optimal control year: 2017 ident: 490_CR11 – volume: 5 start-page: 369 issue: 3 year: 2010 ident: 490_CR16 publication-title: Theor Econ doi: 10.3982/TE632 – volume: 50 start-page: 227 issue: 1 year: 1991 ident: 490_CR20 publication-title: Math Program doi: 10.1007/BF01594936 – volume: 33 start-page: 12861 year: 2020 ident: 490_CR31 publication-title: Adv Neural Inf Process Syst – volume: 42 start-page: 119 issue: 1 year: 2010 ident: 490_CR22 publication-title: Econ Theory doi: 10.1007/s00199-009-0441-5 – volume: 38 start-page: 156 issue: 2 year: 2008 ident: 490_CR13 publication-title: IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) doi: 10.1109/TSMCC.2007.913919 – ident: 490_CR65 – ident: 490_CR59 – ident: 490_CR30 – volume: 75 start-page: 1331 issue: 5 year: 2007 ident: 490_CR8 publication-title: Econometrica doi: 10.1111/j.1468-0262.2007.00796.x – ident: 490_CR2 – ident: 490_CR27 – ident: 490_CR52 – ident: 490_CR36 doi: 10.2307/1911701 – ident: 490_CR62 doi: 10.1007/978-3-030-60990-0_12 – ident: 490_CR19 doi: 10.1007/978-1-4612-4054-9 – volume: 75 start-page: 901 issue: 3 year: 2008 ident: 490_CR42 publication-title: Rev Econ Stud doi: 10.1111/j.1467-937X.2008.00496.x – volume: 18 start-page: 33 issue: 1 year: 1980 ident: 490_CR60 publication-title: SIAM J Control Optim doi: 10.1137/0318003 – volume: 35 start-page: 2101 issue: 6 year: 1997 ident: 490_CR57 publication-title: SIAM J Control Optim doi: 10.1137/S0363012994272460 – volume: 34 start-page: 311 issue: 1 year: 1996 ident: 490_CR56 publication-title: SIAM J Control Optim doi: 10.1137/S036301299325534X – volume: 28 start-page: 1 year: 1964 ident: 490_CR21 publication-title: Hiroshima Math J doi: 10.32917/hmj/1206139508 – volume: 39 start-page: 1095 issue: 10 year: 1953 ident: 490_CR46 publication-title: Proc Nat Acad Sci doi: 10.1073/pnas.39.10.1095 – volume: 28 start-page: 95 issue: 1 year: 1964 ident: 490_CR55 publication-title: Hiroshima Math J doi: 10.32917/hmj/1206139509 – volume: 62 start-page: 53 issue: 1 year: 1995 ident: 490_CR17 publication-title: Rev Econ Stud doi: 10.2307/2297841 – volume: 23 start-page: 1 year: 2022 ident: 490_CR53 publication-title: J Mach Learn Res – ident: 490_CR15 doi: 10.1093/nsr/nwac256 – volume: 22 start-page: 872 issue: 4 year: 1997 ident: 490_CR38 publication-title: Math Op Res doi: 10.1287/moor.22.4.872 – volume-title: Algorithms for stochastic games year: 1991 ident: 490_CR12 doi: 10.1007/978-94-011-3760-7_5 – volume: 419 start-page: 1322 issue: 2 year: 2014 ident: 490_CR26 publication-title: J Math Anal Appl doi: 10.1016/j.jmaa.2014.05.061 – volume-title: Prediction, learning, and games year: 2006 ident: 490_CR14 doi: 10.1017/CBO9780511546921 – ident: 490_CR47 – volume-title: Handbook of dynamic game theory year: 2018 ident: 490_CR10 doi: 10.1007/978-3-319-44374-4 – ident: 490_CR43 – volume-title: Constrained Markov decision processes: stochastic modeling year: 1999 ident: 490_CR6 – ident: 490_CR35 doi: 10.2307/1911700 – volume: 2 start-page: 55 issue: 1 year: 2001 ident: 490_CR33 publication-title: Cognit Syst Res doi: 10.1016/S1389-0417(01)00015-8 – ident: 490_CR63 – volume: 91 start-page: 325 issue: 3 year: 2013 ident: 490_CR7 publication-title: Mach Learn doi: 10.1007/s10994-013-5368-1 – volume: 38 start-page: 373 issue: 2 year: 2007 ident: 490_CR40 publication-title: RAND J Econ doi: 10.1111/j.1756-2171.2007.tb00073.x
SSID	ssj0000461657
Score	2.2927465
Snippet	Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	56
SubjectTerms	Algorithms Approximation Communications Engineering Complexity Computer Systems Organization and Communication Networks Economic Theory/Quantitative Economics/Mathematical Methods Economics Equilibrium Game Theory Games Management Science Markov analysis Mathematics Mathematics and Statistics Multi-agent Dynamic Decision Making and Learning Multiagent systems Networks Operations Research Robustness Social and Behav. Sciences
SummonAdditionalLinks	– databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7QOAAHBAPEYKAcuEGkJa37OA7EmBBDYmPSblWTJqexItoh8e9xsnYVCJA453Gw4_hzHH8m5AK08RQEhiEal8xXYcCkEj6LM2FAIUQG96Y7egyGU_9-BrOqKKyof7vXKUl3UzfFbp7wbDWxx1y6iuHFuwmWzgtP8VT01y8rlkI8cBSf6M5wchhBVS3z8zZfPVIDM79lRp3DGeyR3Qop0v5KtftkQy_aZKsuJC7aZGe0plwtDsjTOJfLorQ3F00XGZ2klveXWnu3nJflB80NtZ3P5uwaPVdGR_3xA0XISivmaTZZvlBbupO_0zv7d_aQTAe3zzdDVvVLYAqjtJIFvTCDkKtQhyZKjYHQqIhjtKxi9MuSg1HAFaRotjGXgMoLdY9HWhgBOko974i0FvlCHxOqJPeVCaSfoQ-XqSWMwthMSu3HCkwkOoTXMktURSZue1rMk4YG2co5QTknTs4Jrrlcr3ldUWn8ObtbqyKpzKpIPAyAECFCzDvkqlZPM_z7bif_m35KtoU7IfavWZe0yrelPkPwUcpzd9Y-AedPzLA priority: 102 providerName: Springer Nature
Title	Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games
URI	https://link.springer.com/article/10.1007/s13235-023-00490-2 https://www.proquest.com/docview/3255511591
Volume	13
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 2153-0793 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000461657 issn: 2153-0785 databaseCode: AFBBN dateStart: 20110301 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 2153-0793 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000461657 issn: 2153-0785 databaseCode: AGYKE dateStart: 20110101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 2153-0793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000461657 issn: 2153-0785 databaseCode: U2A dateStart: 20110301 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lj9MwEB71cdk9IF6r7VIqH7iBRe3EeRwQ6qI-BLSCdit1T1Hs2KduUmiKxL9nJk1agcRe49iHGdvfZ3vmG4A3yjrPqMBxZOOa-yYMuDbS53EmnTJIkVV1pztfBLO1_3mjNi1YNLkwFFbZ7InVRp0Vhu7I33vIfZEcqFh83P3gVDWKXlebEhppXVoh-1BJjLWhK0kZqwPd2_Hi2_J060Ly4kEl_4lQ53HER1Vn0hzz6TzpUcIyttCLGJd_o9WZgv7zalqB0eQpPKlZJBsd3f4MWjZ_DpfzkwTr_gV8Xxb6sC9pJ2NpnrFVSjrAjNY_aWCWv1nhGFVC2_JbRLKMzUfLrwwpLKuVqPnq8MAolaf4xaYUS_sS1pPx3acZr-sncIOntpIHwzBToTChDV2UOqdCZyKBp2cTI05roZxRwqgUl3EstEJnhnYoIiudVDZKPe8KOnmR22tgRgvfuED7GWK6TklACs9qWls_NspFsgeisVNianFxqnGxTc6yyGTbBG2bVLZNsM_bU5_dUVrj0b_7jfmTepntk_Ok6MG7xiXn5v-PdvP4aK_gQlazgGLN-tApfx7sayQfpR5AO5pMB9AdTe-_jAf1_MKvazn6A-yW1f8
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDI4QHIAD4inGMwc4QQRJmz4OCPEesE0wQOJWmjQ5wQqsA_Hn-G3YXboJJLhxThOpjmN_duLPhGxIYz0tA8sAjSvm6zBgSgufxZmwUgNElmVOt9kK6nf-xb28HyGfVS0MPqusbGJpqLNcY458xwPsC-BAxnz_-YVh1yi8Xa1aaKSutUK2V1KMucKOS_PxDiFcd-_8GPZ7U4jTk9ujOnNdBpiG2KZgwW6YyZDr0IQ2Sq2VodURhxhTx-DNFJdWS65lCsoecyXhl0OzyyMjrJAmSjEhCi5gzPf8GIK_scOT1lV7kOVBOvOgpBsF1-ox8MfSVe706_c84WGBNIzgDRwT373jEPL-uKUtnd_pNJlyqJUe9NVshoyYziyZbA4oX7tz5Lqdq163QMtJ005Gb1LkHaZob5Bzs_iguaXYee2RHYLnzGjzoN2gAJmpY75mN70niqVD-Rs9w7e78-TuXyS5QEY7eccsEqoV97UNlJ8BhlApElZBbKiU8WMtbSRqhFdySrQjM8eeGo_JkIYZZZuAbJNStgnM2RrMee5Tefz59Uol_sQd624yVMIa2a62ZDj8-2pLf6-2Tsbrt81G0jhvXS6TCVFqBL5zWyGjxWvPrALwKdSa0y5KHv5bob8AcHgQMw
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MJj4ORlEjitqDN22g3e0-jvhAVCAKknDbbLvtCXeJu5j4750uu6BGTTz3cZjpdL5p-31F6IwrbUnuaAJoXBBbug4RktnEj5jmEiAyz890e32nM7Lvx3z8icWfv3YvryTnnAaj0hRnjWmkG0vim8Uswyy2SH51RWATXrWNUAKs6BFrLU5ZjJy4k8t9Qht0dj1eMGd-nuZrdlpCzm-3pHnyaW-jrQI14tbczTtoRcVVtF6SitMq2uwt5FfTXfQ0SMQszcwuhsM4wsPQaABjE_tG_zJ7x4nG5he0CbmELBbhXmvQxQBfcaFCTYazF2xoPMkbvjXvaPfQqH3zfNUhxd8JRELFlhGn6UbcpdJVrvZCrbmrpUehcpY-5GhBuZacSh5CCPtUcHCkq5rUU0wzrrzQsvZRJU5idYCwFNSW2hF2BPlchEY8Cuo0IZTtS649VkO0tFkgC2Fx87_FJFhKIhs7B2DnILdzAGPOF2Omc1mNP3vXS1cERYilgQXFEKBF7tMauijds2z-fbbD_3U_RWuP1-2ge9d_OEIbLF8s5glaHVWy15k6BkySiZN82X0ACh3T2A
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Robustness+and+Sample+Complexity+of+Model-Based+MARL+for+General-Sum+Markov+Games&rft.jtitle=Dynamic+games+and+applications&rft.au=Subramanian%2C+Jayakumar&rft.au=Sinha%2C+Amit&rft.au=Mahajan%2C+Aditya&rft.date=2023-03-01&rft.issn=2153-0785&rft.eissn=2153-0793&rft_id=info:doi/10.1007%2Fs13235-023-00490-2&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s13235_023_00490_2
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2153-0785&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2153-0785&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2153-0785&client=summon