A reinforcement learning method with closed-loop stability guarantee

Pavel Osinenko, Lukas Beckenbach, Thomas Göhrt, Stefan Streif

Research output: Contribution to journalConference articlepeer-review

Abstract

Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinitehorizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.

Original languageEnglish
Pages (from-to)8043-8048
Number of pages6
JournalIFAC-PapersOnLine
Volume53
Issue number2
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event21st IFAC World Congress 2020 - Berlin, Germany
Duration: 12 Jul 202017 Jul 2020

Keywords

  • Lyapunov methods
  • Reinforcement learning control
  • Stability of nonlinear systems

Fingerprint

Dive into the research topics of 'A reinforcement learning method with closed-loop stability guarantee'. Together they form a unique fingerprint.

Cite this