A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling

Massachusetts Institute of Technology
An overview of our method for predicting and repairing failure modes in autonomous systems, shown here handling connectivity failures in a drone swarm. First, we use differentiable simulation to optimize an initial solution; then, we predict failure modes using fast gradient-based sampling. We use these failure modes to re-optimize the design and repeat this process until a high-quality solution is achieved.


An overview of our method for predicting and repairing failure modes in autonomous systems, shown here handling connectivity failures in a drone swarm.

Video

Abstract

Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design to preemptively mitigate those failures. We frame this problem through the lens of approximate Bayesian inference and use differentiable simulation for efficient failure case prediction and repair. We apply our approach on a range of robotics and control problems, including optimizing search patterns for robot swarms and reducing the severity of outages in power transmission networks. Compared to optimization-based falsification techniques, our method predicts a more diverse, representative set of failure modes, and we also find that our use of differentiable simulation yields solutions that have up to 10x lower cost and requires up to 2x fewer iterations to converge relative to gradient-free techniques.

Finding more diverse counterexamples leads to more robust designs, in both simulation and hardware.

(Left) HW results for search-evasion with 5 hiders and 3 seekers, showing an initial search pattern (blue) and predicted failure modes (red). (Center) HW results for an optimized search pattern leaves fewer hiding places. (Right, top) An initial manipulation policy knocks over the object. (Right, bottom) The repaired manipulation policy pushes without knocking the bottle over.

Applications to electrical power networks demonstrate scalability to high-dimensional systems.

An illustration of a counterexample for an electrical power network: a series of line outages that could lead to a blackout.

Related Links

This work is part of a broader research thread around automated testing and design automation , which allow engineers to more effectively design safe, reliable, and resilient robotic systems.

Other work on this topic from our lab include:

BibTeX

@article{dawson2023_breaking_things,
      author    = {Dawson, Charles and Fan, Chuchu},
      title     = {A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling },
      journal   = {Conference on Robot Learning (CoRL)},
      year      = {2023},
    }
}