Bloom: an open source tool for automated behavioral evaluations

MATS Fellow:

Isha Gupta

Authors:

Isha Gupta, Kai Fronsdal, Abhay Sheshadri, Jonathan Michala, Jacqueline Tay, Rowan Wang, Samuel R. Bowman, Sara Price

Citations

0 Citations

Abstract:

We are releasing Bloom, an agentic framework for developing behavioral evaluations. Bloom's evaluations are reproducible and targeted: unlike open-ended auditing, Bloom takes a researcher-specified behavior and quantifies its frequency and severity across automatically generated scenarios. Bloom's evaluations correlate strongly with our hand-labelled judgments and reliably separate baseline models from intentionally misaligned ones. As examples, we also release benchmark results for four alignment relevant behaviors on 16 models. Bloom is available at github.com/safety-research/bloom.

Recent research

Bloom: an open source tool for automated behavioral evaluations

Authors:

Isha Gupta

Date:

December 19, 2025

Citations:

0

Frequently asked questions

What is the MATS Program?
How long does the program last?