DGPPO

Control policies that can achieve high task performance and satisfy safety contraints are desirable for any system, including multi-agent systems (MAS). One promising technique for ensuring the safety of MAS is distributed control barrier functions (CBF). However, it is difficult to design distributed CBF-based policies for MAS that can tackle unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, especially when a distributed high-performance nominal policy that can achieve the task is unavailable. To tackle these challenges, we propose DGPPO, a new framework that simultaneously learns both a discrete graph CBF which handles neighborhood changes and input constraints, and a distributed high-performance safe policy for MAS with unknown discrete-time dynamics. We empirically validate our claims on a suite of multi-agent tasks spanning three different simulation engines. The results suggest that, compared with existing methods, our DGPPO framework obtains policies that achieve high task performance (matching baselines that ignore the safety constraints), and high safety rates (matching the most conservative baselines), with a constant set of hyperparameters across all environments.

Lidar environments, where agents use LiDAR to detect obstacles.

MuJoCo and VMAS environments, where contact dynamics are included.

This work is built on our previous work GCBF+ and GCBFv0, which eliminates the requirement of a performant nominal policy and the knowledge of dynamics.

For the multi-agent constrained optimal control problem, we can also directly solve it without a CBF (better sampling efficiency but less robustness) using Distributed Epigraph Form MARL (Def-MARL).

For a survey of the field of learning safe control for multi-robot systems, see this paper.

BibTeX

@inproceedings{zhang2025dgppo,
      title={Discrete {GCBF} Proximal Policy Optimization for Multi-agent Safe Optimal Control},
      author={Zhang, Songyuan and So, Oswin and Black, Mitchell and Fan, Chuchu},
      booktitle={The Thirteenth International Conference on Learning Representations},
      year={2025},
}

Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

DGPPO: How to extend CBFs elegantly for safe MARL.

Abstract

Problem setting

Safety: Discrete policy GCBF (DGCBF)

DGPPO: Elegantly combine DGCBF with MARL

Simulation Environments

Numerical Results

Related Work

BibTeX