Learning to Stabilize High-dimensional Unknown Systems Using
Lyapunov-guided Exploration

Massachusetts Institute of Technology

Abstract

Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that cannot be accurately modeled using differential equations. Lyapunov theory offers a robust solution for stabilizing control systems. Still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. In this paper, we introduce a novel framework, LYGE, for learning stabilizing controllers specifically tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model from the Air Force featuring a 16D state space and a 4D input space. Experimental results indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by alternative certificate functions for unknown systems.

Experiments

CartPole Environment:
  • LYGE can stabilize the system closest to the goal.
  • PPO can stabilize the system a bit farther from the goal.
  • AIRL, D-REX, and SSRR stabilize the system farther from the goal.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR
CartIIPole Environment:
  • LYGE can stabilize the system.
  • PPO can also stabilize the system.
  • AIRL recovers the bad behavior of the demonstrations.
  • D-REX stabilizes the system at the wrong state.
  • SSRR cannot stabilize the system.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR
F-16GCA Environment:
  • LYGE pulls up F-16 fast and stabilizes it at the desired altitude.
  • PPO makes dangerous behaviors, and sometimes collides with the ground.
  • AIRL and D-REX sometimes collide with the ground.
  • SSRR also makes dangerous behaviors.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR
F-16Tracking Environment:
  • LYGE tracks the goal point well.
  • PPO and SSRR make the F-16 spins, which is a dangerous behavior.
  • AIRL and D-REX cannot reach the goal.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR