Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective
To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the...
Saved in:
| Published in | arXiv.org |
|---|---|
| Main Authors | , , , , |
| Format | Paper |
| Language | English |
| Published |
Ithaca
Cornell University Library, arXiv.org
17.03.2021
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2331-8422 |
Cover
| Abstract | To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of both RL and primal-dual optimization. From the RL perspective, we first develop a new model-free off-policy policy iteration (MF-OPPI) algorithm, in which the sampled data is repeatedly used for updating the policy to alleviate the data-hungry problem to some extent. We then provide a rigorous analysis for algorithm convergence by showing that the involved iterations are equivalent to the iterations in the classical policy iteration (PI) algorithm. From the perspective of optimization, we first reformulate the stochastic LQR problem at hand as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal-dual (MB-PD) algorithm based on the properties of the resulting Karush-Kuhn-Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we show that the dual and primal update steps in the MB-PD algorithm can be interpreted as the policy evaluation and policy improvement steps in the PI algorithm, respectively. Finally, we provide one simulation example to show the performance of the proposed algorithms. |
|---|---|
| AbstractList | To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of both RL and primal-dual optimization. From the RL perspective, we first develop a new model-free off-policy policy iteration (MF-OPPI) algorithm, in which the sampled data is repeatedly used for updating the policy to alleviate the data-hungry problem to some extent. We then provide a rigorous analysis for algorithm convergence by showing that the involved iterations are equivalent to the iterations in the classical policy iteration (PI) algorithm. From the perspective of optimization, we first reformulate the stochastic LQR problem at hand as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal-dual (MB-PD) algorithm based on the properties of the resulting Karush-Kuhn-Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we show that the dual and primal update steps in the MB-PD algorithm can be interpreted as the policy evaluation and policy improvement steps in the PI algorithm, respectively. Finally, we provide one simulation example to show the performance of the proposed algorithms. |
| Author | Qin, Jiahu Kang, Yu Wei Xing Zheng Wang, Yaonan Li, Man |
| Author_xml | – sequence: 1 givenname: Man surname: Li fullname: Li, Man – sequence: 2 givenname: Jiahu surname: Qin fullname: Qin, Jiahu – sequence: 3 fullname: Wei Xing Zheng – sequence: 4 givenname: Yaonan surname: Wang fullname: Wang, Yaonan – sequence: 5 givenname: Yu surname: Kang fullname: Kang, Yu |
| BookMark | eNqNisFqwkAQQBepYFr9h4GeA3FjqvekoQdLNfYuS5zYkc1MOrvpoV-vhX6AvMM7vPdoHlgYJyaxeb5MNytrZ2YRwiXLMvuytkWRJya8ywl9WisiVBjozCAdHKK0Xy5EamG7b6AUjireo0Kn0kODxJ1oiz1yhC06ZeIzOD7BTql3Pq1G5-FjiNTTr4skDDvUMGAb6QfnZto5H3Dx7yfzXL9-lm_poPI9YojHi4zKt3S0RWaLP1b5fdcVMPVNfg |
| ContentType | Paper |
| Copyright | 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Proquest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2331-8422 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-proquest_journals_25025252543 |
| IEDL.DBID | BENPR |
| IngestDate | Mon Jun 30 09:28:15 EDT 2025 |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-proquest_journals_25025252543 |
| Notes | content type line 50 SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 |
| OpenAccessLink | https://www.proquest.com/docview/2502525254?pq-origsite=%requestingapplication%&accountid=15518 |
| PQID | 2502525254 |
| PQPubID | 2050157 |
| ParticipantIDs | proquest_journals_2502525254 |
| PublicationCentury | 2000 |
| PublicationDate | 20210317 |
| PublicationDateYYYYMMDD | 2021-03-17 |
| PublicationDate_xml | – month: 03 year: 2021 text: 20210317 day: 17 |
| PublicationDecade | 2020 |
| PublicationPlace | Ithaca |
| PublicationPlace_xml | – name: Ithaca |
| PublicationTitle | arXiv.org |
| PublicationYear | 2021 |
| Publisher | Cornell University Library, arXiv.org |
| Publisher_xml | – name: Cornell University Library, arXiv.org |
| SSID | ssj0002672553 |
| Score | 3.2397857 |
| SecondaryResourceType | preprint |
| Snippet | To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Algorithms Computational geometry Control systems design Controllers Convex analysis Convexity Iterative methods Kuhn-Tucker method Linear quadratic regulator Linear systems Machine learning Optimization |
| Title | Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective |
| URI | https://www.proquest.com/docview/2502525254 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFH_sA8Gbn_gxxwO9BmmatutBBLfVITrrVNhtZEnqDrWd7Xb1bzeJrTsII6cQCOERfu_l5fd-D-DKTVjocuoQ4WgIZIomhDvCJ45USnki8H1uCbJjf_TOHqbetAHjuhbG0CprTLRALXNhcuTX2lVTzwx2u_wipmuU-V2tW2jwqrWCvLESY01oU6OM1YL23XAcT_6yLtQPdAzt_gNe602iPWjHfKmKfWio7AB2LAlTlIdQmsZkKYkKpXBgmRWYJ_i6ysWCGzllfHyZYP-XW56qAk1pCE6U1T4VNs2HlVzqB_JMYmyUJFIyWPMUnzU0fFY1lxhvKiyP4DIavvVHpD7prLpd5WxjC_cYWlmeqRNAqbgObULBWeAxX85DR7KeDEKu30gJdfgpdLbtdLZ9-Rx2qWFzGCZb0IHWqlirC-2OV_MuNHvRfbeytJ49fQ9_AAd9ltE |
| linkProvider | ProQuest |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB1qi-jNT_youqAeV8lmk5hDEewHra011gq9le3uRg8xqUmL-B_80e6uiT0IRckxYSAZeLMzee8NwJkdUt9mxMLcUhBIJQkxs7iLLSGldLjnuswQZPtu-4nejpxRCT4LLYymVRaYaIBaJFzPyC9VqSaOvuj19A3rrVH672qxQoPlqxVEzViM5cKOrvx4Vy1cVus0VL7PCWk1h_U2LuKP83Rl40VwWz21AhVqU1-1cpWbZj8Y_MxsiOupE7j9C7ZNLWptQCVgU5luQknGW7BqKJw824ZMrzWLcCuVEjUMLwMlIXqcJfyFaTNm1HsYoPo3Mz2SKdLCEjSQxjmVmyEhys1WnxGLBQq0D0WEG3MWoXsFLK-5YhMFC33mDpz-4WV3oRwnsdwDJCRTByOfM-o51BUT3xL0Sng-Ux1WSCy2D9VlkQ6W3z6BtfbwrjfudfrdQ1gnmheiOXFeFcqzdC6PVGGfTY7z743g4n8J-wIs-rnY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Model-Free+Design+of+Stochastic+LQR+Controller+from+Reinforcement+Learning+and+Primal-Dual+Optimization+Perspective&rft.jtitle=arXiv.org&rft.au=Li%2C+Man&rft.au=Qin%2C+Jiahu&rft.au=Wei+Xing+Zheng&rft.au=Wang%2C+Yaonan&rft.date=2021-03-17&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |