Model predictive control (MPC) is the standard approach to infinite-horizon optimal control which usually optimizes a finite initial fragment of the cost function so as to make the problem computationally tractable. Globally optimal controllers are usually found by Dynamic Programming (DP). The computations involved in DP are notoriously hard to perform, especially in online control. Therefore, different approximation schemes of DP, the so-called “critics”, were suggested for infinite-horizon cost functions. This work proposes to incorporate such a critic into dual-mode MPC as a particular means of addressing infinite-horizon optimal control. The proposed critic is based on Q-learning and is used for online approximation of the infinite-horizon cost. Stability of the new approach is analyzed and certain sufficient stabilizing constraints on the critic are derived. A case study demonstrates the applicability.
- infinite-horizon optimization
- Nonlinear MPC
- reinforcement learning