How to Train Your Neural Control Barrier Function:

Learning Safety Filters for Complex Input-Constrained Systems

Oswin So1, Zachary Serlin2, Makai Mann2, Jake Gonzales2, Kwesi Rutledge1, Nicholas Roy1, Chuchu Fan1
1 Massachusetts Institute of Technology, 2 Lincoln Labs

Learning Safety Filters

We tackle the problem of learning safety filters. That is, we wish to minimally modify a given test policy to maintain safety.

Safety Filter Problem

In this work, we construct a safety filter using Control Barrier Functions (CBF).

Constructing CBFs for Input Constrained Systems is Hard

Constructing a CBF for arbitrary input constrained systems is hard. For high relative-degree systems, a common approach is to use Higher-Order CBFs (HOCBFs). However, even on the simplest example of a double-integrator with bounded accelerations, many HOCBF candidate functions fail to satisfy the CBF conditions and are unsafe.

For example, consider a double integrator (\(\dot{p} = v, \dot{v} = u \)) with box-constrained accelerations \(\lvert u \rvert \leq 1\) and a safety constraint for the position to be positive (\( p \geq 0 \)). The HOCBF candidate \( B(x) = -v - \alpha p \) is valid if and only if \( \alpha = 0 \), which deems all negative velocities as unsafe and is overly conservative. Other choices of \( \alpha \) will result in safety violations for some regions of the state space.

While the HOCBF candidate with \(\alpha > 0\) appears to be safe on easier states, it will result in safety violations on harder states and hence is not a valid HOCBF.

Policy CBFs: Constructing CBFs from the Policy Value Function

In this work, we use the insight that the maximum-over-time value function is a CBF for any choice of nominal policy \(\pi\).

\( \displaystyle V^{h,\pi}(x) \coloneqq \sup_{t \geq 0}\, h(x_t^\pi) \).

where the avoid set \( \mathcal{A} \) is described as the superlevel set of some continuous function \(h\):

\( \displaystyle \mathcal{A} = \{ x \mid h(x) > 0 \} \).

Learning the policy value function \(V^{h,\pi}\) for a nominal policy \(\pi\) can be interpreted as policy distillation: \(V^{h,\pi}\) contains knowledge about the invariant set, which can be used as a safety filter for another (potentially unsafe) policy.

Simulation Experiments

F16 Fighter Jet

Avoid
Avoid crashing into the ground. Avoid extreme angles of attack.

Abstract

Control barrier functions (CBF) have become popular as a safety filter to guarantee the safety of nonlinear dynamical systems for arbitrary inputs. However, it is difficult to construct functions that satisfy the CBF constraints for high relative degree systems with input constraints. To address these challenges, recent work has explored learning CBFs using neural networks via neural CBF (NCBF). However, such methods face difficulties when scaling to higher dimensional systems under input constraints.

In this work, we first identify challenges that NCBFs face during training. Next, to address these challenges, we propose policy neural CBF (PNCBF), a method of constructing CBFs by learning the value function of a nominal policy, and show that the value function of the maximum-over-time cost is a CBF. We demonstrate the effectiveness of our method in simulation on a variety of systems ranging from toy linear systems to an F-16 jet with a 16-dimensional state space. Finally, we validate our approach on a two-agent quadcopter system on hardware under tight input constraints.