Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective

To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Li, Man, Qin, Jiahu, Wei Xing Zheng, Wang, Yaonan, Kang, Yu
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 17.03.2021
Subjects
Online AccessGet full text
ISSN2331-8422

Cover

Abstract To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of both RL and primal-dual optimization. From the RL perspective, we first develop a new model-free off-policy policy iteration (MF-OPPI) algorithm, in which the sampled data is repeatedly used for updating the policy to alleviate the data-hungry problem to some extent. We then provide a rigorous analysis for algorithm convergence by showing that the involved iterations are equivalent to the iterations in the classical policy iteration (PI) algorithm. From the perspective of optimization, we first reformulate the stochastic LQR problem at hand as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal-dual (MB-PD) algorithm based on the properties of the resulting Karush-Kuhn-Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we show that the dual and primal update steps in the MB-PD algorithm can be interpreted as the policy evaluation and policy improvement steps in the PI algorithm, respectively. Finally, we provide one simulation example to show the performance of the proposed algorithms.
AbstractList To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of both RL and primal-dual optimization. From the RL perspective, we first develop a new model-free off-policy policy iteration (MF-OPPI) algorithm, in which the sampled data is repeatedly used for updating the policy to alleviate the data-hungry problem to some extent. We then provide a rigorous analysis for algorithm convergence by showing that the involved iterations are equivalent to the iterations in the classical policy iteration (PI) algorithm. From the perspective of optimization, we first reformulate the stochastic LQR problem at hand as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal-dual (MB-PD) algorithm based on the properties of the resulting Karush-Kuhn-Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we show that the dual and primal update steps in the MB-PD algorithm can be interpreted as the policy evaluation and policy improvement steps in the PI algorithm, respectively. Finally, we provide one simulation example to show the performance of the proposed algorithms.
Author Qin, Jiahu
Kang, Yu
Wei Xing Zheng
Wang, Yaonan
Li, Man
Author_xml – sequence: 1
  givenname: Man
  surname: Li
  fullname: Li, Man
– sequence: 2
  givenname: Jiahu
  surname: Qin
  fullname: Qin, Jiahu
– sequence: 3
  fullname: Wei Xing Zheng
– sequence: 4
  givenname: Yaonan
  surname: Wang
  fullname: Wang, Yaonan
– sequence: 5
  givenname: Yu
  surname: Kang
  fullname: Kang, Yu
BookMark eNqNisFqwkAQQBepYFr9h4GeA3FjqvekoQdLNfYuS5zYkc1MOrvpoV-vhX6AvMM7vPdoHlgYJyaxeb5MNytrZ2YRwiXLMvuytkWRJya8ywl9WisiVBjozCAdHKK0Xy5EamG7b6AUjireo0Kn0kODxJ1oiz1yhC06ZeIzOD7BTql3Pq1G5-FjiNTTr4skDDvUMGAb6QfnZto5H3Dx7yfzXL9-lm_poPI9YojHi4zKt3S0RWaLP1b5fdcVMPVNfg
ContentType Paper
Copyright 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Proquest Central
Technology Collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-proquest_journals_25025252543
IEDL.DBID BENPR
IngestDate Mon Jun 30 09:28:15 EDT 2025
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_25025252543
Notes content type line 50
SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
OpenAccessLink https://www.proquest.com/docview/2502525254?pq-origsite=%requestingapplication%&accountid=15518
PQID 2502525254
PQPubID 2050157
ParticipantIDs proquest_journals_2502525254
PublicationCentury 2000
PublicationDate 20210317
PublicationDateYYYYMMDD 2021-03-17
PublicationDate_xml – month: 03
  year: 2021
  text: 20210317
  day: 17
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2021
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 3.2397857
SecondaryResourceType preprint
Snippet To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Algorithms
Computational geometry
Control systems design
Controllers
Convex analysis
Convexity
Iterative methods
Kuhn-Tucker method
Linear quadratic regulator
Linear systems
Machine learning
Optimization
Title Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective
URI https://www.proquest.com/docview/2502525254
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFH_sA8Gbn_gxxwO9BmmatutBBLfVITrrVNhtZEnqDrWd7Xb1bzeJrTsII6cQCOERfu_l5fd-D-DKTVjocuoQ4WgIZIomhDvCJ45USnki8H1uCbJjf_TOHqbetAHjuhbG0CprTLRALXNhcuTX2lVTzwx2u_wipmuU-V2tW2jwqrWCvLESY01oU6OM1YL23XAcT_6yLtQPdAzt_gNe602iPWjHfKmKfWio7AB2LAlTlIdQmsZkKYkKpXBgmRWYJ_i6ysWCGzllfHyZYP-XW56qAk1pCE6U1T4VNs2HlVzqB_JMYmyUJFIyWPMUnzU0fFY1lxhvKiyP4DIavvVHpD7prLpd5WxjC_cYWlmeqRNAqbgObULBWeAxX85DR7KeDEKu30gJdfgpdLbtdLZ9-Rx2qWFzGCZb0IHWqlirC-2OV_MuNHvRfbeytJ49fQ9_AAd9ltE
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB1qi-jNT_youqAeV8lmk5hDEewHra011gq9le3uRg8xqUmL-B_80e6uiT0IRckxYSAZeLMzee8NwJkdUt9mxMLcUhBIJQkxs7iLLSGldLjnuswQZPtu-4nejpxRCT4LLYymVRaYaIBaJFzPyC9VqSaOvuj19A3rrVH672qxQoPlqxVEzViM5cKOrvx4Vy1cVus0VL7PCWk1h_U2LuKP83Rl40VwWz21AhVqU1-1cpWbZj8Y_MxsiOupE7j9C7ZNLWptQCVgU5luQknGW7BqKJw824ZMrzWLcCuVEjUMLwMlIXqcJfyFaTNm1HsYoPo3Mz2SKdLCEjSQxjmVmyEhys1WnxGLBQq0D0WEG3MWoXsFLK-5YhMFC33mDpz-4WV3oRwnsdwDJCRTByOfM-o51BUT3xL0Sng-Ux1WSCy2D9VlkQ6W3z6BtfbwrjfudfrdQ1gnmheiOXFeFcqzdC6PVGGfTY7z743g4n8J-wIs-rnY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Model-Free+Design+of+Stochastic+LQR+Controller+from+Reinforcement+Learning+and+Primal-Dual+Optimization+Perspective&rft.jtitle=arXiv.org&rft.au=Li%2C+Man&rft.au=Qin%2C+Jiahu&rft.au=Wei+Xing+Zheng&rft.au=Wang%2C+Yaonan&rft.date=2021-03-17&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422