Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best r...

Full description

Saved in:
Bibliographic Details
Published inDynamic games and applications Vol. 13; no. 1; pp. 56 - 88
Main Authors Subramanian, Jayakumar, Sinha, Amit, Mahajan, Aditya
Format Journal Article
LanguageEnglish
Published New York Springer US 01.03.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2153-0785
2153-0793
DOI10.1007/s13235-023-00490-2

Cover

Abstract Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that O ~ ( ( 1 - γ ) - 4 α - 2 ) samples per state–action pair are sufficient to obtain a α -approximate Markov perfect equilibrium with high probability, where γ is the discount factor, and the O ~ ( · ) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that O ~ ( ( 1 - γ ) - 1 α - 2 ) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.
AbstractList Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that O ~ ( ( 1 - γ ) - 4 α - 2 ) samples per state–action pair are sufficient to obtain a α -approximate Markov perfect equilibrium with high probability, where γ is the discount factor, and the O ~ ( · ) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that O ~ ( ( 1 - γ ) - 1 α - 2 ) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.
Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model-based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality-based bounds to show that O~((1-γ)-4α-2) samples per state–action pair are sufficient to obtain a α-approximate Markov perfect equilibrium with high probability, where γ is the discount factor, and the O~(·) notation hides logarithmic terms. We then use Bernstein inequality-based bounds to show that O~((1-γ)-1α-2) samples are sufficient. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and provide explicit bounds on the approximation error. We illustrate the results via a numerical example.
Author Subramanian, Jayakumar
Mahajan, Aditya
Sinha, Amit
Author_xml – sequence: 1
  givenname: Jayakumar
  orcidid: 0000-0003-4621-2677
  surname: Subramanian
  fullname: Subramanian, Jayakumar
  email: jasubram@adobe.com
  organization: Media and Data Science Research Lab, Digital Experience Cloud, Adobe Inc
– sequence: 2
  givenname: Amit
  surname: Sinha
  fullname: Sinha, Amit
  organization: Department of Electrical and Computer Engineering, McGill University
– sequence: 3
  givenname: Aditya
  orcidid: 0000-0001-8125-1191
  surname: Mahajan
  fullname: Mahajan, Aditya
  organization: Department of Electrical and Computer Engineering, McGill University
BookMark eNp9kE9LAzEQxYMoWGu_gKeA59X82TS7x1q0Ci1iq-eQzU5k625Sk63Yb-_WFQUPncubgfebGd4ZOnbeAUIXlFxRQuR1pJxxkRDGE0LSnCTsCA0YFd0oc37822fiFI1iXJOu0jEdCzlAT0tfbGPrIEasXYlXutnUgKd-L59Vu8Pe4oUvoU5udIQSLybLObY-4Bk4CLpOVtsGL3R48x94phuI5-jE6jrC6EeH6OXu9nl6n8wfZw_TyTwxKcnbZExkKSQ1EqTNtLVCWpNRKLjJeUYLKqwR1AidEpLTQoDpnIRmwCwTkGnOh-iy37sJ_n0LsVVrvw2uO6k4E0JQKnLaubLeZYKPMYBVpmp1W3nXBl3VihK1z1D1GaouQ_WdoWIdyv6hm1A1OuwOQ7yHYmd2rxD-vjpAfQHjVoRm
CitedBy_id crossref_primary_10_1007_s13235_023_00493_z
Cites_doi 10.2307/2297841
10.2307/1428011
10.1111/j.1467-937X.2008.00496.x
10.32917/hmj/1206139509
10.1287/moor.22.4.872
10.1257/aer.91.4.938
10.1007/BF01594936
10.1287/mnsc.12.5.359
10.3982/TE632
10.1016/S1389-0417(01)00015-8
10.1007/978-3-319-44374-4
10.1137/S0363012994272460
10.1111/j.1756-2171.2007.tb00073.x
10.1137/0318003
10.2307/2601038
10.2307/1426772
10.1016/j.jmaa.2014.05.061
10.1111/j.1468-0262.2007.00731.x
10.1073/pnas.39.10.1095
10.1137/S036301299325534X
10.32917/hmj/1206139508
10.1017/CBO9780511546921
10.1006/jeth.2000.2785
10.1007/s00199-009-0441-5
10.1007/978-94-011-3760-7_5
10.1007/978-0-8176-4757-5
10.1007/s10994-013-5368-1
10.1109/TSMCC.2007.913919
10.1111/j.1468-0262.2007.00796.x
10.1093/acprof:oso/9780195300796.001.0001
10.1007/978-3-030-32430-8_29
10.1016/B978-1-55860-335-6.50027-1
10.1016/B978-1-55860-141-3.50030-4
10.24963/ijcai.2021/466
10.1007/s00186-005-0438-1
10.2307/1911701
10.1007/978-3-030-60990-0_12
10.1007/978-1-4612-4054-9
10.1093/nsr/nwac256
10.2307/1911700
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
DBID AAYXX
CITATION
3V.
7WY
7WZ
7XB
87Z
8FE
8FG
8FK
8FL
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
M0C
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PYYUZ
Q9U
DOI 10.1007/s13235-023-00490-2
DatabaseName CrossRef
ProQuest Central (Corporate)
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials - QC
ProQuest Central
Business Premium Collection
Technology Collection (ProQuest)
ProQuest One Community College
ProQuest Central
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
ABI/INFORM Global
Advanced Technologies & Aerospace Collection
ProQuest Advanced Technologies & Aerospace Collection
Proquest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ABI/INFORM Collection China
ProQuest Central Basic
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Central (New)
ABI/INFORM Complete (Alumni Edition)
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest One Academic Eastern Edition
ABI/INFORM China
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
Business Premium Collection (Alumni)
DatabaseTitleList
ABI/INFORM Global (Corporate)
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Economics
Mathematics
EISSN 2153-0793
EndPage 88
ExternalDocumentID 10_1007_s13235_023_00490_2
GrantInformation_xml – fundername: Canadian Department of National Defence
  grantid: CFPMN2-30
GroupedDBID -EM
0R~
0VY
203
2VQ
30V
4.4
406
408
409
96X
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
AAZMS
ABAKF
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABJNI
ABJOX
ABKCH
ABMQK
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABXPI
ACAOD
ACDTI
ACGFS
ACHSB
ACKNC
ACMLO
ACOKC
ACPIV
ACZOJ
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFQL
AEGNC
AEJHL
AEJRE
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETCA
AEVLU
AEXYK
AFBBN
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGMZJ
AGQEE
AGQMX
AGRTI
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
AKLTO
ALFXC
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMXSW
AMYLF
AMYQR
ANMIH
AUKKA
AXYYD
AYJHY
BAPOH
BGNMA
CSCUP
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FYJPI
GGCAI
GGRSB
GJIRD
GQ6
GQ8
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I0C
IKXTQ
IWAJR
IXD
IZIGR
J-C
J9A
JBSCW
JCJTX
JZLTJ
KOV
LLZTM
M4Y
NPVJJ
NQJWS
NU0
O9-
O93
O9J
PQQKQ
PT4
RLLFE
ROL
RSV
S1Z
S27
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
T13
TSG
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W48
WK8
Z7R
Z81
ZMTXR
~A9
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
AEZWR
AFDZB
AFHIU
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
3V.
7WY
7XB
8FE
8FG
8FK
8FL
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
M0C
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c409t-607d571c7e7f8aff57fc81eb3c9381b15fc51c5a40091b5ecc7e018e2f25e8a33
IEDL.DBID BENPR
ISSN 2153-0785
IngestDate Tue Sep 30 03:21:51 EDT 2025
Wed Oct 01 04:15:31 EDT 2025
Thu Apr 24 23:07:03 EDT 2025
Fri Feb 21 02:45:59 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c409t-607d571c7e7f8aff57fc81eb3c9381b15fc51c5a40091b5ecc7e018e2f25e8a33
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-8125-1191
0000-0003-4621-2677
PQID 3255511591
PQPubID 2043993
PageCount 33
ParticipantIDs proquest_journals_3255511591
crossref_citationtrail_10_1007_s13235_023_00490_2
crossref_primary_10_1007_s13235_023_00490_2
springer_journals_10_1007_s13235_023_00490_2
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-03-01
PublicationDateYYYYMMDD 2023-03-01
PublicationDate_xml – month: 03
  year: 2023
  text: 2023-03-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Heidelberg
PublicationTitle Dynamic games and applications
PublicationTitleAbbrev Dyn Games Appl
PublicationYear 2023
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Fink (CR21) 1964; 28
Ericson, Pakes (CR17) 1995; 62
Kearns, Singh (CR28) 1999; 871
Subramanian, Sinha, Seraj, Mahajan (CR53) 2022; 23
CR36
Takahashi (CR55) 1964; 28
CR35
CR32
CR30
Littman (CR33) 2001; 2
Müller (CR39) 1997; 29
Li, Wei, Chi, Gu, Chen (CR31) 2020; 33
Altman (CR6) 1999
Hoffman, Karp (CR25) 1966; 12
Müller (CR38) 1997; 22
Albright, Winston (CR5) 1979; 11
Fershtiman, Pakes (CR18) 2000; 31
Maskin, Tirole (CR37) 2001; 100
CR2
Whitt (CR60) 1980; 18
CR4
Busoniu, Babuska, De Schutter (CR13) 2008; 38
Jaśkiewicz, Nowak (CR26) 2014; 419
Başar, Bernhard (CR9) 2008
CR49
CR48
CR47
CR45
CR44
CR43
CR41
Pesendorfer, Schmidt-Dengler (CR42) 2008; 75
Shapley (CR46) 1953; 39
Filar, Schultz, Thuijsman, Vrieze (CR20) 1991; 50
Pakes, Ostrovsky, Berry (CR40) 2007; 38
CR19
Breton (CR12) 1991
CR15
CR59
Tidball, Altman (CR56) 1996; 34
CR58
Acemoglu, Robinson (CR1) 2001; 91
Bajari, Benkard, Levin (CR8) 2007; 75
Bertsekas (CR11) 2017
CR54
CR52
Aguirregabiria, Mira (CR3) 2007; 75
CR51
Doraszelski, Escobar (CR16) 2010; 5
Herings, Peeters (CR22) 2010; 42
Cesa-Bianch, Lugosi (CR14) 2006
CR29
CR27
Mailath, Samuelson (CR34) 2006
Solan (CR50) 2021
Başar, Zaccour (CR10) 2018
CR24
Tidball, Pourtallier, Altman (CR57) 1997; 35
Azar, Munos, Kappen (CR7) 2013; 91
CR65
CR64
Zhang, Kakade, Basar, Yang (CR61) 2020; 33
CR63
CR62
Herings, Peeters (CR23) 2004; 118
490_CR54
A Pakes (490_CR40) 2007; 38
490_CR52
490_CR15
AM Fink (490_CR21) 1964; 28
490_CR59
E Altman (490_CR6) 1999
490_CR58
490_CR19
A Jaśkiewicz (490_CR26) 2014; 419
G Li (490_CR31) 2020; 33
L Busoniu (490_CR13) 2008; 38
LS Shapley (490_CR46) 1953; 39
U Doraszelski (490_CR16) 2010; 5
M Breton (490_CR12) 1991
PJ-J Herings (490_CR23) 2004; 118
MG Azar (490_CR7) 2013; 91
SC Albright (490_CR5) 1979; 11
V Aguirregabiria (490_CR3) 2007; 75
R Ericson (490_CR17) 1995; 62
490_CR62
MM Tidball (490_CR57) 1997; 35
490_CR65
490_CR64
490_CR63
M Pesendorfer (490_CR42) 2008; 75
E Maskin (490_CR37) 2001; 100
A Müller (490_CR39) 1997; 29
J Subramanian (490_CR53) 2022; 23
490_CR24
K Zhang (490_CR61) 2020; 33
490_CR29
490_CR27
D Acemoglu (490_CR1) 2001; 91
ML Littman (490_CR33) 2001; 2
T Başar (490_CR9) 2008
GJ Mailath (490_CR34) 2006
PJ-J Herings (490_CR22) 2010; 42
490_CR32
490_CR30
490_CR36
490_CR35
A Müller (490_CR38) 1997; 22
M Takahashi (490_CR55) 1964; 28
C Fershtiman (490_CR18) 2000; 31
MM Tidball (490_CR56) 1996; 34
P Bajari (490_CR8) 2007; 75
DP Bertsekas (490_CR11) 2017
M Kearns (490_CR28) 1999; 871
T Başar (490_CR10) 2018
490_CR44
490_CR43
490_CR41
490_CR48
490_CR47
W Whitt (490_CR60) 1980; 18
490_CR45
490_CR4
AJ Hoffman (490_CR25) 1966; 12
490_CR49
N Cesa-Bianch (490_CR14) 2006
E Solan (490_CR50) 2021
490_CR2
JA Filar (490_CR20) 1991; 50
490_CR51
References_xml – ident: CR45
– volume: 62
  start-page: 53
  issue: 1
  year: 1995
  end-page: 82
  ident: CR17
  article-title: Markov-perfect industry dynamics: a framework for empirical work
  publication-title: Rev Econ Stud
  doi: 10.2307/2297841
– volume: 29
  start-page: 429
  issue: 2
  year: 1997
  end-page: 443
  ident: CR39
  article-title: Integral probability metrics and their generating classes of functions
  publication-title: Adv Appl Probab
  doi: 10.2307/1428011
– volume: 75
  start-page: 901
  issue: 3
  year: 2008
  end-page: 928
  ident: CR42
  article-title: Asymptotic least squares estimators for dynamic games
  publication-title: Rev Econ Stud
  doi: 10.1111/j.1467-937X.2008.00496.x
– ident: CR49
– volume: 28
  start-page: 95
  issue: 1
  year: 1964
  ident: CR55
  article-title: Equilibrium points of stochastic non-cooperative -person games
  publication-title: Hiroshima Math J
  doi: 10.32917/hmj/1206139509
– ident: CR4
– volume: 22
  start-page: 872
  issue: 4
  year: 1997
  end-page: 885
  ident: CR38
  article-title: How does the value function of a Markov decision process depend on the transition probabilities?
  publication-title: Math Op Res
  doi: 10.1287/moor.22.4.872
– ident: CR51
– volume: 91
  start-page: 938
  issue: 4
  year: 2001
  end-page: 963
  ident: CR1
  article-title: A theory of political transitions
  publication-title: Am Econ Rev
  doi: 10.1257/aer.91.4.938
– volume: 50
  start-page: 227
  issue: 1
  year: 1991
  end-page: 237
  ident: CR20
  article-title: Nonlinear programming and stationary equilibria in stochastic games
  publication-title: Math Program
  doi: 10.1007/BF01594936
– volume: 12
  start-page: 359
  issue: 5
  year: 1966
  end-page: 370
  ident: CR25
  article-title: On nonterminating stochastic games
  publication-title: Manage Sci
  doi: 10.1287/mnsc.12.5.359
– volume: 5
  start-page: 369
  issue: 3
  year: 2010
  end-page: 402
  ident: CR16
  article-title: A theory of regular Markov perfect equilibria in dynamic stochastic games: Genericity, stability, and purification
  publication-title: Theor Econ
  doi: 10.3982/TE632
– volume: 2
  start-page: 55
  issue: 1
  year: 2001
  end-page: 66
  ident: CR33
  article-title: Value-function reinforcement learning in Markov games
  publication-title: Cognit Syst Res
  doi: 10.1016/S1389-0417(01)00015-8
– ident: CR35
– year: 2018
  ident: CR10
  publication-title: Handbook of dynamic game theory
  doi: 10.1007/978-3-319-44374-4
– ident: CR29
– ident: CR54
– ident: CR58
– volume: 35
  start-page: 2101
  issue: 6
  year: 1997
  end-page: 2117
  ident: CR57
  article-title: Approximations in dynamic zero-sum games II
  publication-title: SIAM J Control Optim
  doi: 10.1137/S0363012994272460
– volume: 38
  start-page: 373
  issue: 2
  year: 2007
  end-page: 399
  ident: CR40
  article-title: Simple estimators for the parameters of discrete dynamic games (with entry/exit examples)
  publication-title: RAND J Econ
  doi: 10.1111/j.1756-2171.2007.tb00073.x
– volume: 18
  start-page: 33
  issue: 1
  year: 1980
  end-page: 48
  ident: CR60
  article-title: Representation and approximation of noncooperative sequential games
  publication-title: SIAM J Control Optim
  doi: 10.1137/0318003
– volume: 118
  start-page: 32
  issue: 1
  year: 2004
  end-page: 60
  ident: CR23
  article-title: Stationary equilibria in stochastic games: structure, selection, and computation
  publication-title: J Econ Theory
– volume: 31
  start-page: 207
  issue: 2
  year: 2000
  end-page: 236
  ident: CR18
  article-title: A dynamic oligopoly with collusion and price wars
  publication-title: RAND J Econ
  doi: 10.2307/2601038
– ident: CR19
– volume: 11
  start-page: 134
  issue: 1
  year: 1979
  end-page: 152
  ident: CR5
  article-title: A birth-death model of advertising and pricing
  publication-title: Adv Appl Probab
  doi: 10.2307/1426772
– volume: 419
  start-page: 1322
  issue: 2
  year: 2014
  end-page: 1332
  ident: CR26
  article-title: Robust Markov perfect equilibria
  publication-title: J Math Anal Appl
  doi: 10.1016/j.jmaa.2014.05.061
– volume: 75
  start-page: 1
  issue: 1
  year: 2007
  end-page: 53
  ident: CR3
  article-title: Sequential estimation of dynamic discrete games
  publication-title: Econometrica
  doi: 10.1111/j.1468-0262.2007.00731.x
– ident: CR15
– year: 2021
  ident: CR50
  publication-title: A course in stochastic game theory
– volume: 39
  start-page: 1095
  issue: 10
  year: 1953
  end-page: 1100
  ident: CR46
  article-title: Stochastic games
  publication-title: Proc Nat Acad Sci
  doi: 10.1073/pnas.39.10.1095
– ident: CR32
– volume: 34
  start-page: 311
  issue: 1
  year: 1996
  end-page: 328
  ident: CR56
  article-title: Approximations in dynamic zero-sum games I
  publication-title: SIAM J Control Optim
  doi: 10.1137/S036301299325534X
– year: 1999
  ident: CR6
  publication-title: Constrained Markov decision processes: stochastic modeling
– ident: CR36
– ident: CR64
– volume: 28
  start-page: 1
  year: 1964
  ident: CR21
  article-title: Equilibrium in a stochastic -person game
  publication-title: Hiroshima Math J
  doi: 10.32917/hmj/1206139508
– volume: 23
  start-page: 1
  year: 2022
  end-page: 12
  ident: CR53
  article-title: Approximate information state for approximate planning and reinforcement learning in partially observed systems
  publication-title: J Mach Learn Res
– year: 2006
  ident: CR14
  publication-title: Prediction, learning, and games
  doi: 10.1017/CBO9780511546921
– volume: 33
  start-page: 12861
  year: 2020
  ident: CR31
  article-title: Breaking the sample size barrier in model-based reinforcement learning with a generative model
  publication-title: Adv Neural Inf Process Syst
– volume: 871
  start-page: 996
  year: 1999
  end-page: 1002
  ident: CR28
  article-title: Finite-sample convergence rates for q-learning and indirect algorithms
  publication-title: Adv Neural Inf Process Syst
– volume: 100
  start-page: 191
  issue: 2
  year: 2001
  end-page: 219
  ident: CR37
  article-title: Markov perfect equilibrium: I. observable actions
  publication-title: J Econ Theory
  doi: 10.1006/jeth.2000.2785
– volume: 42
  start-page: 119
  issue: 1
  year: 2010
  end-page: 156
  ident: CR22
  article-title: Homotopy methods to compute equilibria in game theory
  publication-title: Econ Theory
  doi: 10.1007/s00199-009-0441-5
– ident: CR43
– ident: CR47
– volume: 33
  start-page: 1166
  year: 2020
  ident: CR61
  article-title: Model-based multi-agent rl in zero-sum Markov games with near-optimal sample complexity
  publication-title: Adv Neural Inf Process Syst
– year: 1991
  ident: CR12
  publication-title: Algorithms for stochastic games
  doi: 10.1007/978-94-011-3760-7_5
– ident: CR2
– ident: CR30
– year: 2008
  ident: CR9
  publication-title: H-infinity optimal control and related minimax design problems: a dynamic game approach
  doi: 10.1007/978-0-8176-4757-5
– volume: 91
  start-page: 325
  issue: 3
  year: 2013
  end-page: 349
  ident: CR7
  article-title: Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
  publication-title: Mach Learn
  doi: 10.1007/s10994-013-5368-1
– volume: 38
  start-page: 156
  issue: 2
  year: 2008
  end-page: 172
  ident: CR13
  article-title: A comprehensive survey of multiagent reinforcement learning
  publication-title: IEEE Trans Syst, Man, Cybern, Part C (Appl Rev)
  doi: 10.1109/TSMCC.2007.913919
– ident: CR63
– ident: CR27
– year: 2017
  ident: CR11
  publication-title: Dynamic programming and optimal control
– ident: CR44
– ident: CR48
– volume: 75
  start-page: 1331
  issue: 5
  year: 2007
  end-page: 1370
  ident: CR8
  article-title: Estimating dynamic models of imperfect competition
  publication-title: Econometrica
  doi: 10.1111/j.1468-0262.2007.00796.x
– ident: CR65
– ident: CR52
– year: 2006
  ident: CR34
  publication-title: Repeated games and reputations: long-run relationships
  doi: 10.1093/acprof:oso/9780195300796.001.0001
– ident: CR59
– ident: CR41
– ident: CR62
– ident: CR24
– ident: 490_CR4
– volume-title: A course in stochastic game theory
  year: 2021
  ident: 490_CR50
– volume: 31
  start-page: 207
  issue: 2
  year: 2000
  ident: 490_CR18
  publication-title: RAND J Econ
  doi: 10.2307/2601038
– volume: 33
  start-page: 1166
  year: 2020
  ident: 490_CR61
  publication-title: Adv Neural Inf Process Syst
– ident: 490_CR29
– ident: 490_CR48
– volume: 11
  start-page: 134
  issue: 1
  year: 1979
  ident: 490_CR5
  publication-title: Adv Appl Probab
  doi: 10.2307/1426772
– volume: 91
  start-page: 938
  issue: 4
  year: 2001
  ident: 490_CR1
  publication-title: Am Econ Rev
  doi: 10.1257/aer.91.4.938
– volume: 118
  start-page: 32
  issue: 1
  year: 2004
  ident: 490_CR23
  publication-title: J Econ Theory
– volume-title: Repeated games and reputations: long-run relationships
  year: 2006
  ident: 490_CR34
  doi: 10.1093/acprof:oso/9780195300796.001.0001
– ident: 490_CR44
– volume-title: H-infinity optimal control and related minimax design problems: a dynamic game approach
  year: 2008
  ident: 490_CR9
  doi: 10.1007/978-0-8176-4757-5
– volume: 871
  start-page: 996
  year: 1999
  ident: 490_CR28
  publication-title: Adv Neural Inf Process Syst
– ident: 490_CR41
– volume: 75
  start-page: 1
  issue: 1
  year: 2007
  ident: 490_CR3
  publication-title: Econometrica
  doi: 10.1111/j.1468-0262.2007.00731.x
– ident: 490_CR58
– volume: 100
  start-page: 191
  issue: 2
  year: 2001
  ident: 490_CR37
  publication-title: J Econ Theory
  doi: 10.1006/jeth.2000.2785
– volume: 12
  start-page: 359
  issue: 5
  year: 1966
  ident: 490_CR25
  publication-title: Manage Sci
  doi: 10.1287/mnsc.12.5.359
– ident: 490_CR45
  doi: 10.1007/978-3-030-32430-8_29
– volume: 29
  start-page: 429
  issue: 2
  year: 1997
  ident: 490_CR39
  publication-title: Adv Appl Probab
  doi: 10.2307/1428011
– ident: 490_CR32
  doi: 10.1016/B978-1-55860-335-6.50027-1
– ident: 490_CR51
– ident: 490_CR54
  doi: 10.1016/B978-1-55860-141-3.50030-4
– ident: 490_CR49
– ident: 490_CR64
  doi: 10.24963/ijcai.2021/466
– ident: 490_CR24
  doi: 10.1007/s00186-005-0438-1
– volume-title: Dynamic programming and optimal control
  year: 2017
  ident: 490_CR11
– volume: 5
  start-page: 369
  issue: 3
  year: 2010
  ident: 490_CR16
  publication-title: Theor Econ
  doi: 10.3982/TE632
– volume: 50
  start-page: 227
  issue: 1
  year: 1991
  ident: 490_CR20
  publication-title: Math Program
  doi: 10.1007/BF01594936
– volume: 33
  start-page: 12861
  year: 2020
  ident: 490_CR31
  publication-title: Adv Neural Inf Process Syst
– volume: 42
  start-page: 119
  issue: 1
  year: 2010
  ident: 490_CR22
  publication-title: Econ Theory
  doi: 10.1007/s00199-009-0441-5
– volume: 38
  start-page: 156
  issue: 2
  year: 2008
  ident: 490_CR13
  publication-title: IEEE Trans Syst, Man, Cybern, Part C (Appl Rev)
  doi: 10.1109/TSMCC.2007.913919
– ident: 490_CR65
– ident: 490_CR59
– ident: 490_CR30
– volume: 75
  start-page: 1331
  issue: 5
  year: 2007
  ident: 490_CR8
  publication-title: Econometrica
  doi: 10.1111/j.1468-0262.2007.00796.x
– ident: 490_CR2
– ident: 490_CR27
– ident: 490_CR52
– ident: 490_CR36
  doi: 10.2307/1911701
– ident: 490_CR62
  doi: 10.1007/978-3-030-60990-0_12
– ident: 490_CR19
  doi: 10.1007/978-1-4612-4054-9
– volume: 75
  start-page: 901
  issue: 3
  year: 2008
  ident: 490_CR42
  publication-title: Rev Econ Stud
  doi: 10.1111/j.1467-937X.2008.00496.x
– volume: 18
  start-page: 33
  issue: 1
  year: 1980
  ident: 490_CR60
  publication-title: SIAM J Control Optim
  doi: 10.1137/0318003
– volume: 35
  start-page: 2101
  issue: 6
  year: 1997
  ident: 490_CR57
  publication-title: SIAM J Control Optim
  doi: 10.1137/S0363012994272460
– volume: 34
  start-page: 311
  issue: 1
  year: 1996
  ident: 490_CR56
  publication-title: SIAM J Control Optim
  doi: 10.1137/S036301299325534X
– volume: 28
  start-page: 1
  year: 1964
  ident: 490_CR21
  publication-title: Hiroshima Math J
  doi: 10.32917/hmj/1206139508
– volume: 39
  start-page: 1095
  issue: 10
  year: 1953
  ident: 490_CR46
  publication-title: Proc Nat Acad Sci
  doi: 10.1073/pnas.39.10.1095
– volume: 28
  start-page: 95
  issue: 1
  year: 1964
  ident: 490_CR55
  publication-title: Hiroshima Math J
  doi: 10.32917/hmj/1206139509
– volume: 62
  start-page: 53
  issue: 1
  year: 1995
  ident: 490_CR17
  publication-title: Rev Econ Stud
  doi: 10.2307/2297841
– volume: 23
  start-page: 1
  year: 2022
  ident: 490_CR53
  publication-title: J Mach Learn Res
– ident: 490_CR15
  doi: 10.1093/nsr/nwac256
– volume: 22
  start-page: 872
  issue: 4
  year: 1997
  ident: 490_CR38
  publication-title: Math Op Res
  doi: 10.1287/moor.22.4.872
– volume-title: Algorithms for stochastic games
  year: 1991
  ident: 490_CR12
  doi: 10.1007/978-94-011-3760-7_5
– volume: 419
  start-page: 1322
  issue: 2
  year: 2014
  ident: 490_CR26
  publication-title: J Math Anal Appl
  doi: 10.1016/j.jmaa.2014.05.061
– volume-title: Prediction, learning, and games
  year: 2006
  ident: 490_CR14
  doi: 10.1017/CBO9780511546921
– ident: 490_CR47
– volume-title: Handbook of dynamic game theory
  year: 2018
  ident: 490_CR10
  doi: 10.1007/978-3-319-44374-4
– ident: 490_CR43
– volume-title: Constrained Markov decision processes: stochastic modeling
  year: 1999
  ident: 490_CR6
– ident: 490_CR35
  doi: 10.2307/1911700
– volume: 2
  start-page: 55
  issue: 1
  year: 2001
  ident: 490_CR33
  publication-title: Cognit Syst Res
  doi: 10.1016/S1389-0417(01)00015-8
– ident: 490_CR63
– volume: 91
  start-page: 325
  issue: 3
  year: 2013
  ident: 490_CR7
  publication-title: Mach Learn
  doi: 10.1007/s10994-013-5368-1
– volume: 38
  start-page: 373
  issue: 2
  year: 2007
  ident: 490_CR40
  publication-title: RAND J Econ
  doi: 10.1111/j.1756-2171.2007.tb00073.x
SSID ssj0000461657
Score 2.2927465
Snippet Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 56
SubjectTerms Algorithms
Approximation
Communications Engineering
Complexity
Computer Systems Organization and Communication Networks
Economic Theory/Quantitative Economics/Mathematical Methods
Economics
Equilibrium
Game Theory
Games
Management Science
Markov analysis
Mathematics
Mathematics and Statistics
Multi-agent Dynamic Decision Making and Learning
Multiagent systems
Networks
Operations Research
Robustness
Social and Behav. Sciences
SummonAdditionalLinks – databaseName: SpringerLink Journals (ICM)
  dbid: U2A
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7QOAAHBAPEYKAcuEGkJa37OA7EmBBDYmPSblWTJqexItoh8e9xsnYVCJA453Gw4_hzHH8m5AK08RQEhiEal8xXYcCkEj6LM2FAIUQG96Y7egyGU_9-BrOqKKyof7vXKUl3UzfFbp7wbDWxx1y6iuHFuwmWzgtP8VT01y8rlkI8cBSf6M5wchhBVS3z8zZfPVIDM79lRp3DGeyR3Qop0v5KtftkQy_aZKsuJC7aZGe0plwtDsjTOJfLorQ3F00XGZ2klveXWnu3nJflB80NtZ3P5uwaPVdGR_3xA0XISivmaTZZvlBbupO_0zv7d_aQTAe3zzdDVvVLYAqjtJIFvTCDkKtQhyZKjYHQqIhjtKxi9MuSg1HAFaRotjGXgMoLdY9HWhgBOko974i0FvlCHxOqJPeVCaSfoQ-XqSWMwthMSu3HCkwkOoTXMktURSZue1rMk4YG2co5QTknTs4Jrrlcr3ldUWn8ObtbqyKpzKpIPAyAECFCzDvkqlZPM_z7bif_m35KtoU7IfavWZe0yrelPkPwUcpzd9Y-AedPzLA
  priority: 102
  providerName: Springer Nature
Title Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games
URI https://link.springer.com/article/10.1007/s13235-023-00490-2
https://www.proquest.com/docview/3255511591
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 2153-0793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000461657
  issn: 2153-0785
  databaseCode: AFBBN
  dateStart: 20110301
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 2153-0793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000461657
  issn: 2153-0785
  databaseCode: AGYKE
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 2153-0793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000461657
  issn: 2153-0785
  databaseCode: U2A
  dateStart: 20110301
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lj9MwEB71cdk9IF6r7VIqH7iBRe3EeRwQ6qI-BLSCdit1T1Hs2KduUmiKxL9nJk1agcRe49iHGdvfZ3vmG4A3yjrPqMBxZOOa-yYMuDbS53EmnTJIkVV1pztfBLO1_3mjNi1YNLkwFFbZ7InVRp0Vhu7I33vIfZEcqFh83P3gVDWKXlebEhppXVoh-1BJjLWhK0kZqwPd2_Hi2_J060Ly4kEl_4lQ53HER1Vn0hzz6TzpUcIyttCLGJd_o9WZgv7zalqB0eQpPKlZJBsd3f4MWjZ_DpfzkwTr_gV8Xxb6sC9pJ2NpnrFVSjrAjNY_aWCWv1nhGFVC2_JbRLKMzUfLrwwpLKuVqPnq8MAolaf4xaYUS_sS1pPx3acZr-sncIOntpIHwzBToTChDV2UOqdCZyKBp2cTI05roZxRwqgUl3EstEJnhnYoIiudVDZKPe8KOnmR22tgRgvfuED7GWK6TklACs9qWls_NspFsgeisVNianFxqnGxTc6yyGTbBG2bVLZNsM_bU5_dUVrj0b_7jfmTepntk_Ok6MG7xiXn5v-PdvP4aK_gQlazgGLN-tApfx7sayQfpR5AO5pMB9AdTe-_jAf1_MKvazn6A-yW1f8
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDI4QHIAD4inGMwc4QQRJmz4OCPEesE0wQOJWmjQ5wQqsA_Hn-G3YXboJJLhxThOpjmN_duLPhGxIYz0tA8sAjSvm6zBgSgufxZmwUgNElmVOt9kK6nf-xb28HyGfVS0MPqusbGJpqLNcY458xwPsC-BAxnz_-YVh1yi8Xa1aaKSutUK2V1KMucKOS_PxDiFcd-_8GPZ7U4jTk9ujOnNdBpiG2KZgwW6YyZDr0IQ2Sq2VodURhxhTx-DNFJdWS65lCsoecyXhl0OzyyMjrJAmSjEhCi5gzPf8GIK_scOT1lV7kOVBOvOgpBsF1-ox8MfSVe706_c84WGBNIzgDRwT373jEPL-uKUtnd_pNJlyqJUe9NVshoyYziyZbA4oX7tz5Lqdq163QMtJ005Gb1LkHaZob5Bzs_iguaXYee2RHYLnzGjzoN2gAJmpY75mN70niqVD-Rs9w7e78-TuXyS5QEY7eccsEqoV97UNlJ8BhlApElZBbKiU8WMtbSRqhFdySrQjM8eeGo_JkIYZZZuAbJNStgnM2RrMee5Tefz59Uol_sQd624yVMIa2a62ZDj8-2pLf6-2Tsbrt81G0jhvXS6TCVFqBL5zWyGjxWvPrALwKdSa0y5KHv5bob8AcHgQMw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MJj4ORlEjitqDN22g3e0-jvhAVCAKknDbbLvtCXeJu5j4750uu6BGTTz3cZjpdL5p-31F6IwrbUnuaAJoXBBbug4RktnEj5jmEiAyz890e32nM7Lvx3z8icWfv3YvryTnnAaj0hRnjWmkG0vim8Uswyy2SH51RWATXrWNUAKs6BFrLU5ZjJy4k8t9Qht0dj1eMGd-nuZrdlpCzm-3pHnyaW-jrQI14tbczTtoRcVVtF6SitMq2uwt5FfTXfQ0SMQszcwuhsM4wsPQaABjE_tG_zJ7x4nG5he0CbmELBbhXmvQxQBfcaFCTYazF2xoPMkbvjXvaPfQqH3zfNUhxd8JRELFlhGn6UbcpdJVrvZCrbmrpUehcpY-5GhBuZacSh5CCPtUcHCkq5rUU0wzrrzQsvZRJU5idYCwFNSW2hF2BPlchEY8Cuo0IZTtS649VkO0tFkgC2Fx87_FJFhKIhs7B2DnILdzAGPOF2Omc1mNP3vXS1cERYilgQXFEKBF7tMauijds2z-fbbD_3U_RWuP1-2ge9d_OEIbLF8s5glaHVWy15k6BkySiZN82X0ACh3T2A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Robustness+and+Sample+Complexity+of+Model-Based+MARL+for+General-Sum+Markov+Games&rft.jtitle=Dynamic+games+and+applications&rft.au=Subramanian%2C+Jayakumar&rft.au=Sinha%2C+Amit&rft.au=Mahajan%2C+Aditya&rft.date=2023-03-01&rft.issn=2153-0785&rft.eissn=2153-0793&rft_id=info:doi/10.1007%2Fs13235-023-00490-2&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s13235_023_00490_2
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2153-0785&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2153-0785&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2153-0785&client=summon