Learning Safety Filters
We tackle the problem of learning safety filters. That is, we wish to minimally modify a given test policy to maintain safety.
In this work, we construct a safety filter using Control Barrier Functions (CBF).
Constructing CBFs for Input Constrained Systems is Hard
Constructing a CBF for arbitrary input constrained systems is hard. For high relative-degree systems, a common approach is to use Higher-Order CBFs (HOCBFs). However, even on the simplest example of a double-integrator with bounded accelerations, many HOCBF candidate functions fail to satisfy the CBF conditions and are unsafe.
For example, consider a double integrator (\(\dot{p} = v, \dot{v} = u \)) with box-constrained accelerations \(\lvert u \rvert \leq 1\) and a safety constraint for the position to be positive (\( p \geq 0 \)). The HOCBF candidate \( B(x) = -v - \alpha p \) is valid if and only if \( \alpha = 0 \), which deems all negative velocities as unsafe and is overly conservative. Other choices of \( \alpha \) will result in safety violations for some regions of the state space.
Policy CBFs: Constructing CBFs from the Policy Value Function
In this work, we use the insight that the maximum-over-time value function is a CBF for any choice of nominal policy \(\pi\).
where the avoid set \( \mathcal{A} \) is described as the superlevel set of some continuous function \(h\):
Learning the policy value function \(V^{h,\pi}\) for a nominal policy \(\pi\) can be interpreted as policy distillation: \(V^{h,\pi}\) contains knowledge about the invariant set, which can be used as a safety filter for another (potentially unsafe) policy.
Simulation Experiments
F16 Fighter Jet
- Avoid
- Avoid crashing into the ground. Avoid extreme angles of attack.
Abstract
Control barrier functions (CBF) have become popular as a safety filter to guarantee the safety of nonlinear dynamical systems for arbitrary inputs. However, it is difficult to construct functions that satisfy the CBF constraints for high relative degree systems with input constraints. To address these challenges, recent work has explored learning CBFs using neural networks via neural CBF (NCBF). However, such methods face difficulties when scaling to higher dimensional systems under input constraints.
In this work, we first identify challenges that NCBFs face during training. Next, to address these challenges, we propose policy neural CBF (PNCBF), a method of constructing CBFs by learning the value function of a nominal policy, and show that the value function of the maximum-over-time cost is a CBF. We demonstrate the effectiveness of our method in simulation on a variety of systems ranging from toy linear systems to an F-16 jet with a 16-dimensional state space. Finally, we validate our approach on a two-agent quadcopter system on hardware under tight input constraints.