CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints
In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially...
Saved in:
| Published in | Machine Learning and Knowledge Discovery in Databases. Research Track Vol. 12975; pp. 157 - 173 |
|---|---|
| Main Authors | , , , , , |
| Format | Book Chapter |
| Language | English |
| Published |
Switzerland
Springer International Publishing AG
2021
Springer International Publishing |
| Series | Lecture Notes in Computer Science |
| Subjects | |
| Online Access | Get full text |
| ISBN | 3030864855 9783030864859 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/978-3-030-86486-6_10 |
Cover
| Abstract | In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially observable Markov decision process – where the agents need to maximize a global reward function subject to both peak and average constraints. We propose a novel algorithm, CMIX, to enable centralized training and decentralized execution (CTDE) under those constraints. In particular, CMIX amends the reward function to take peak constraint violations into account and then transforms the resulting problem under average constraints to a max-min optimization problem. We leverage the value function factorization method to develop a CTDE algorithm for solving the max-min optimization problem, and two gap loss functions are proposed to eliminate the bias of learned solutions. We evaluate our CMIX algorithm on a blocker game with travel cost and a large-scale vehicular network routing problem. The results show that CMIX outperforms existing algorithms including IQL, VDN, and QMIX, in that it optimizes the global reward objective while satisfying both peak and average constraints. To the best of our knowledge, this is the first proposal of a CTDE learning algorithm subject to both peak and average constraints. |
|---|---|
| AbstractList | In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially observable Markov decision process – where the agents need to maximize a global reward function subject to both peak and average constraints. We propose a novel algorithm, CMIX, to enable centralized training and decentralized execution (CTDE) under those constraints. In particular, CMIX amends the reward function to take peak constraint violations into account and then transforms the resulting problem under average constraints to a max-min optimization problem. We leverage the value function factorization method to develop a CTDE algorithm for solving the max-min optimization problem, and two gap loss functions are proposed to eliminate the bias of learned solutions. We evaluate our CMIX algorithm on a blocker game with travel cost and a large-scale vehicular network routing problem. The results show that CMIX outperforms existing algorithms including IQL, VDN, and QMIX, in that it optimizes the global reward objective while satisfying both peak and average constraints. To the best of our knowledge, this is the first proposal of a CTDE learning algorithm subject to both peak and average constraints. |
| Author | Aggarwal, Vaneet Yang, Yuan Geng, Nan Liu, Chenyi Xu, Mingwei Lan, Tian |
| Author_xml | – sequence: 1 givenname: Chenyi surname: Liu fullname: Liu, Chenyi email: liucheny19@mails.tsinghua.edu.cn – sequence: 2 givenname: Nan surname: Geng fullname: Geng, Nan – sequence: 3 givenname: Vaneet surname: Aggarwal fullname: Aggarwal, Vaneet – sequence: 4 givenname: Tian surname: Lan fullname: Lan, Tian – sequence: 5 givenname: Yuan surname: Yang fullname: Yang, Yuan – sequence: 6 givenname: Mingwei surname: Xu fullname: Xu, Mingwei |
| BookMark | eNo1kNtOAyEQhvEYW-0beMELoJyWBe-a9Zi00RhNvCNIB12t7ApbfX2pVW6G-Wf-ycw3Rruxi4DQMaMnjNL61NSaCEIFJVpJrYiyjG6hSZFFEX81tY1GTDFGhJBmB43_C1W1i0blz4mppdhHY8aV5qo8eYAmOb9RSnnNZS3kCN0285unM3wO0OP5ajm0xL1AHPA9tDF0ycPHOpuBS7GNL_i7HV7xHbh37OICT78glXbcdDEPybVxyEdoL7hlhslfPESPlxcPzTWZ3V7dNNMZ6bkUA-GVpkIZSoM2OoABo7k24KX3RnheMeO494tAORPKOREEDTIEXbFQLUIdxCHim7m5T2UxSPa5695zoWTX_GwBZYUtFOwvK7vmV0xyY-pT97mCPFhYu3w5Mbmlf3X9AClbVeAoKS1TZZiR4gdwPG91 |
| ContentType | Book Chapter |
| Copyright | Springer Nature Switzerland AG 2021 |
| Copyright_xml | – notice: Springer Nature Switzerland AG 2021 |
| DBID | FFUUA |
| DOI | 10.1007/978-3-030-86486-6_10 |
| DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9783030864866 3030864863 |
| EISSN | 1611-3349 |
| Editor | Pérez-Cruz, Fernando Lozano, Jose A Oliver, Nuria Kramer, Stefan Read, Jesse |
| Editor_xml | – sequence: 1 fullname: Oliver, Nuria – sequence: 2 fullname: Kramer, Stefan – sequence: 3 fullname: Pérez-Cruz, Fernando – sequence: 4 fullname: Lozano, Jose A – sequence: 5 fullname: Read, Jesse |
| EndPage | 173 |
| ExternalDocumentID | EBC6724644_160_194 |
| GroupedDBID | 38. AABBV AABLV ABNDO ACWLQ AEDXK AEJLV AEKFX AELOD ALMA_UNASSIGNED_HOLDINGS BAHJK BBABE CZZ DBWEY FFUUA I4C IEZ OCUHQ ORHYB SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02 |
| ID | FETCH-LOGICAL-p243t-258036900f898fe9e98289ec4cc93c2519a2ccdf02136aa3f30f4ff851f5df7f3 |
| ISBN | 3030864855 9783030864859 |
| ISSN | 0302-9743 |
| IngestDate | Wed Sep 17 04:00:06 EDT 2025 Mon Jan 13 02:23:04 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| LCCallNum | Q334-342 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-p243t-258036900f898fe9e98289ec4cc93c2519a2ccdf02136aa3f30f4ff851f5df7f3 |
| OCLC | 1268266664 |
| PQID | EBC6724644_160_194 |
| PageCount | 17 |
| ParticipantIDs | springer_books_10_1007_978_3_030_86486_6_10 proquest_ebookcentralchapters_6724644_160_194 |
| PublicationCentury | 2000 |
| PublicationDate | 2021 |
| PublicationDateYYYYMMDD | 2021-01-01 |
| PublicationDate_xml | – year: 2021 text: 2021 |
| PublicationDecade | 2020 |
| PublicationPlace | Switzerland |
| PublicationPlace_xml | – name: Switzerland – name: Cham |
| PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
| PublicationSeriesTitle | Lecture Notes in Computer Science |
| PublicationSeriesTitleAlternate | Lect.Notes Computer |
| PublicationSubtitle | European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part I |
| PublicationTitle | Machine Learning and Knowledge Discovery in Databases. Research Track |
| PublicationYear | 2021 |
| Publisher | Springer International Publishing AG Springer International Publishing |
| Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
| RelatedPersons | Hartmanis, Juris Gao, Wen Bertino, Elisa Woeginger, Gerhard Goos, Gerhard Steffen, Bernhard Yung, Moti |
| RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Gerhard orcidid: 0000-0001-8816-2693 surname: Woeginger fullname: Woeginger, Gerhard – sequence: 7 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti |
| SSID | ssj0002724734 ssj0002792 |
| Score | 2.1516285 |
| Snippet | In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while... |
| SourceID | springer proquest |
| SourceType | Publisher |
| StartPage | 157 |
| SubjectTerms | Average constraint Multi-agent reinforcement learning Peak constraint |
| Title | CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints |
| URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6724644&ppg=194 http://link.springer.com/10.1007/978-3-030-86486-6_10 |
| Volume | 12975 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07b9swECYcdyk6JH2hebTgkC1gYJN6sZvjuHHcph2aFN4IiiKXAK4bK0t-fe9o0pYEL8kiGIJI0ffR9PF433eEnHJjC1cOYW9SGssSyTUrZDpgpS6QiZ7LyiE5-eZnNr1LZvN03uv9bmQtPdbluXnaySt5CapwD3BFluwzkN10CjfgM-ALV0AYrh3ntx1mDRWGMA3SRoXUNdXwewyRoaymwfRMz-u71LXG_6vV-SbVDmXNgyB-mDHjm-s5BggurV2eeWIu00i8giZeXtX4SOL2fT6EC0vqvX_zCN6F-T9YAdTXnahbEQU-7EQUYkSxE5NshMVGV61dqEDNmywpgrR3XFaRs7tzkW7mZUBThm0zlqmQ39rSxB6uSyB3NLEnF-Ms5wn4cmqYwS4GHlr-Y1hKDI_cQ12VPbIHY-uTV6PJ7MefTeCNQ8NcYLmvzbjTtRLT9ns0OJa7htnajXQO0L1fcntA3iBXhSKJBAb-lvTs4h3Zj5U6aFi435NfiO1XisjSBrK0hSyNyFJEliKyFJClAVnaQPYDufs2uR1PWSilwZY8ETXjaQGuihwMXCELZ6WVuNO2JjFGCoPsZc2NqRzMBpFpLZwYuMQ5cMddWrnciY-kv_i7sJ8Itc5lXBpTwk880ULDBkKYCoWsKptXQh8SFo2j_IF_yDI2a1OsVAe5Q3IWLajw8ZWKStrQpxIKTK-86RWa_uiZvR-T19spfkL69cOj_QxuZF1-CRPjP9jjcLQ |
| linkProvider | Library Specific Holdings |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases.+Research+Track&rft.atitle=CMIX%3A+Deep+Multi-agent+Reinforcement+Learning+with+Peak+and+Average+Constraints&rft.date=2021-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030864859&rft.volume=12975&rft_id=info:doi/10.1007%2F978-3-030-86486-6_10&rft.externalDBID=194&rft.externalDocID=EBC6724644_160_194 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6724644-l.jpg |