Learning to Stabilize High-dimensional Unknown Systems Using
Lyapunov-guided Exploration

Massachusetts Institute of Technology

Abstract

Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. This paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems.

Experiments

CartPole Environment:
  • LYGE can stabilize the system closest to the goal.
  • PPO can stabilize the system a bit farther from the goal.
  • AIRL, D-REX, and SSRR stabilize the system farther from the goal.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR
CartIIPole Environment:
  • LYGE can stabilize the system.
  • PPO can also stabilize the system.
  • AIRL recovers the bad behavior of the demonstrations.
  • D-REX stabilizes the system at the wrong state.
  • SSRR cannot stabilize the system.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR
F-16GCA Environment:
  • LYGE pulls up F-16 fast and stabilizes it at the desired altitude.
  • PPO makes dangerous behaviors, and sometimes collides with the ground.
  • AIRL and D-REX sometimes collide with the ground.
  • SSRR also makes dangerous behaviors.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR
F-16Tracking Environment:
  • LYGE tracks the goal point well.
  • PPO and SSRR make the F-16 spins, which is a dangerous behavior.
  • AIRL and D-REX cannot reach the goal.
Demonstrations
LYGE
PPO
AIRL
D-REX
SSRR

Key Idea

iteration 0
Iteration 0
iteration 2
Iteration 2
iteration 6
Iteration 6
iteration 12
Iteration 12

LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. During training, starting from the demonstrations, LYGE explores the useful subset of the state space iteratively, expands the trusted tunnel (represented by the yellow dots) towards the goal, while keep updating the learned the Lyapunov function inside the trusted tunnel.

BibTeX


    @InProceedings{zhang2024lyge
        title = {Learning to stabilize high-dimensional unknown systems using Lyapunov-guided exploration},
        author = {Zhang, Songyuan and Fan, Chuchu},
        booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
        pages = {52--67},
        year = {2024},
        editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
        volume = {242},
        series = {Proceedings of Machine Learning Research},
        month = {15--17 Jul},
        publisher = {PMLR},
        pdf = {https://proceedings.mlr.press/v242/zhang24a/zhang24a.pdf},
        url = {https://proceedings.mlr.press/v242/zhang24a.html},
    }