CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints

In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases. Research Track Vol. 12975; pp. 157 - 173
Main Authors Liu, Chenyi, Geng, Nan, Aggarwal, Vaneet, Lan, Tian, Yang, Yuan, Xu, Mingwei
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3030864855
9783030864859
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-86486-6_10

Cover

Abstract In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially observable Markov decision process – where the agents need to maximize a global reward function subject to both peak and average constraints. We propose a novel algorithm, CMIX, to enable centralized training and decentralized execution (CTDE) under those constraints. In particular, CMIX amends the reward function to take peak constraint violations into account and then transforms the resulting problem under average constraints to a max-min optimization problem. We leverage the value function factorization method to develop a CTDE algorithm for solving the max-min optimization problem, and two gap loss functions are proposed to eliminate the bias of learned solutions. We evaluate our CMIX algorithm on a blocker game with travel cost and a large-scale vehicular network routing problem. The results show that CMIX outperforms existing algorithms including IQL, VDN, and QMIX, in that it optimizes the global reward objective while satisfying both peak and average constraints. To the best of our knowledge, this is the first proposal of a CTDE learning algorithm subject to both peak and average constraints.
AbstractList In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially observable Markov decision process – where the agents need to maximize a global reward function subject to both peak and average constraints. We propose a novel algorithm, CMIX, to enable centralized training and decentralized execution (CTDE) under those constraints. In particular, CMIX amends the reward function to take peak constraint violations into account and then transforms the resulting problem under average constraints to a max-min optimization problem. We leverage the value function factorization method to develop a CTDE algorithm for solving the max-min optimization problem, and two gap loss functions are proposed to eliminate the bias of learned solutions. We evaluate our CMIX algorithm on a blocker game with travel cost and a large-scale vehicular network routing problem. The results show that CMIX outperforms existing algorithms including IQL, VDN, and QMIX, in that it optimizes the global reward objective while satisfying both peak and average constraints. To the best of our knowledge, this is the first proposal of a CTDE learning algorithm subject to both peak and average constraints.
Author Aggarwal, Vaneet
Yang, Yuan
Geng, Nan
Liu, Chenyi
Xu, Mingwei
Lan, Tian
Author_xml – sequence: 1
  givenname: Chenyi
  surname: Liu
  fullname: Liu, Chenyi
  email: liucheny19@mails.tsinghua.edu.cn
– sequence: 2
  givenname: Nan
  surname: Geng
  fullname: Geng, Nan
– sequence: 3
  givenname: Vaneet
  surname: Aggarwal
  fullname: Aggarwal, Vaneet
– sequence: 4
  givenname: Tian
  surname: Lan
  fullname: Lan, Tian
– sequence: 5
  givenname: Yuan
  surname: Yang
  fullname: Yang, Yuan
– sequence: 6
  givenname: Mingwei
  surname: Xu
  fullname: Xu, Mingwei
BookMark eNo1kNtOAyEQhvEYW-0beMELoJyWBe-a9Zi00RhNvCNIB12t7ApbfX2pVW6G-Wf-ycw3Rruxi4DQMaMnjNL61NSaCEIFJVpJrYiyjG6hSZFFEX81tY1GTDFGhJBmB43_C1W1i0blz4mppdhHY8aV5qo8eYAmOb9RSnnNZS3kCN0285unM3wO0OP5ajm0xL1AHPA9tDF0ycPHOpuBS7GNL_i7HV7xHbh37OICT78glXbcdDEPybVxyEdoL7hlhslfPESPlxcPzTWZ3V7dNNMZ6bkUA-GVpkIZSoM2OoABo7k24KX3RnheMeO494tAORPKOREEDTIEXbFQLUIdxCHim7m5T2UxSPa5695zoWTX_GwBZYUtFOwvK7vmV0xyY-pT97mCPFhYu3w5Mbmlf3X9AClbVeAoKS1TZZiR4gdwPG91
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2021
Copyright_xml – notice: Springer Nature Switzerland AG 2021
DBID FFUUA
DOI 10.1007/978-3-030-86486-6_10
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9783030864866
3030864863
EISSN 1611-3349
Editor Pérez-Cruz, Fernando
Lozano, Jose A
Oliver, Nuria
Kramer, Stefan
Read, Jesse
Editor_xml – sequence: 1
  fullname: Oliver, Nuria
– sequence: 2
  fullname: Kramer, Stefan
– sequence: 3
  fullname: Pérez-Cruz, Fernando
– sequence: 4
  fullname: Lozano, Jose A
– sequence: 5
  fullname: Read, Jesse
EndPage 173
ExternalDocumentID EBC6724644_160_194
GroupedDBID 38.
AABBV
AABLV
ABNDO
ACWLQ
AEDXK
AEJLV
AEKFX
AELOD
ALMA_UNASSIGNED_HOLDINGS
BAHJK
BBABE
CZZ
DBWEY
FFUUA
I4C
IEZ
OCUHQ
ORHYB
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p243t-258036900f898fe9e98289ec4cc93c2519a2ccdf02136aa3f30f4ff851f5df7f3
ISBN 3030864855
9783030864859
ISSN 0302-9743
IngestDate Wed Sep 17 04:00:06 EDT 2025
Mon Jan 13 02:23:04 EST 2025
IsPeerReviewed true
IsScholarly true
LCCallNum Q334-342
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-258036900f898fe9e98289ec4cc93c2519a2ccdf02136aa3f30f4ff851f5df7f3
OCLC 1268266664
PQID EBC6724644_160_194
PageCount 17
ParticipantIDs springer_books_10_1007_978_3_030_86486_6_10
proquest_ebookcentralchapters_6724644_160_194
PublicationCentury 2000
PublicationDate 2021
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – year: 2021
  text: 2021
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part I
PublicationTitle Machine Learning and Knowledge Discovery in Databases. Research Track
PublicationYear 2021
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Bertino, Elisa
Woeginger, Gerhard
Goos, Gerhard
Steffen, Bernhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Gerhard
  orcidid: 0000-0001-8816-2693
  surname: Woeginger
  fullname: Woeginger, Gerhard
– sequence: 7
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002724734
ssj0002792
Score 2.1516285
Snippet In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while...
SourceID springer
proquest
SourceType Publisher
StartPage 157
SubjectTerms Average constraint
Multi-agent reinforcement learning
Peak constraint
Title CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6724644&ppg=194
http://link.springer.com/10.1007/978-3-030-86486-6_10
Volume 12975
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07b9swECYcdyk6JH2hebTgkC1gYJN6sZvjuHHcph2aFN4IiiKXAK4bK0t-fe9o0pYEL8kiGIJI0ffR9PF433eEnHJjC1cOYW9SGssSyTUrZDpgpS6QiZ7LyiE5-eZnNr1LZvN03uv9bmQtPdbluXnaySt5CapwD3BFluwzkN10CjfgM-ALV0AYrh3ntx1mDRWGMA3SRoXUNdXwewyRoaymwfRMz-u71LXG_6vV-SbVDmXNgyB-mDHjm-s5BggurV2eeWIu00i8giZeXtX4SOL2fT6EC0vqvX_zCN6F-T9YAdTXnahbEQU-7EQUYkSxE5NshMVGV61dqEDNmywpgrR3XFaRs7tzkW7mZUBThm0zlqmQ39rSxB6uSyB3NLEnF-Ms5wn4cmqYwS4GHlr-Y1hKDI_cQ12VPbIHY-uTV6PJ7MefTeCNQ8NcYLmvzbjTtRLT9ns0OJa7htnajXQO0L1fcntA3iBXhSKJBAb-lvTs4h3Zj5U6aFi435NfiO1XisjSBrK0hSyNyFJEliKyFJClAVnaQPYDufs2uR1PWSilwZY8ETXjaQGuihwMXCELZ6WVuNO2JjFGCoPsZc2NqRzMBpFpLZwYuMQ5cMddWrnciY-kv_i7sJ8Itc5lXBpTwk880ULDBkKYCoWsKptXQh8SFo2j_IF_yDI2a1OsVAe5Q3IWLajw8ZWKStrQpxIKTK-86RWa_uiZvR-T19spfkL69cOj_QxuZF1-CRPjP9jjcLQ
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases.+Research+Track&rft.atitle=CMIX%3A+Deep+Multi-agent+Reinforcement+Learning+with+Peak+and+Average+Constraints&rft.date=2021-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030864859&rft.volume=12975&rft_id=info:doi/10.1007%2F978-3-030-86486-6_10&rft.externalDBID=194&rft.externalDocID=EBC6724644_160_194
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6724644-l.jpg