MATS Fellow:
Muhammad Maaz
Authors:
Muhammad Maaz, Liam DeVoe, Zac Hatfield-Dodds, Nicholas Carlini
Citations
Abstract:
We developed an agent that can efficiently identify bugs in large software projects. To do this, our agent infers general properties of code that should be true, and then by applying property-based testing—a technique similar to fuzz testing—we are able to discover bugs in top Python packages like NumPy, SciPy, and Pandas. After extensive manual validation, we are in the process of reporting these bugs to the developers, several of which have already been patched.
For more information, read the full paper, take a look at the GitHub repository, or browse the bugs we found at our site.
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs
Authors:
Dylan Feng
Date:
May 24, 2026
Citations:
Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming
Authors:
Jasmine Li
Date:
May 24, 2026
Citations:
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.