MATS Fellow:
Maximilian Li
Authors:
Mohammad Taufeeque, Philip Quirke, Maximilian Li, Chris Cundy, Aaron David Tucker, Adam Gleave, Adrià Garriga-Alonso
Citations
Abstract:
Planning is essential for solving complex tasks, yet the internal mechanisms underlying planning in neural networks remain poorly understood. Building on prior work, we analyze a recurrent neural network (RNN) trained on Sokoban, a challenging puzzle requiring sequential, irreversible decisions. We find that the RNN has a causal plan representation which predicts its future actions about 50 steps in advance. The quality and length of the represented plan increases over the first few steps. We uncover a surprising behavior: the RNN"paces"in cycles to give itself extra computation at the start of a level, and show that this behavior is incentivized by training. Leveraging these insights, we extend the trained RNN to significantly larger, out-of-distribution Sokoban puzzles, demonstrating robust representations beyond the training regime. We open-source our model and code, and believe the neural network's interesting behavior makes it an excellent model organism to deepen our understanding of learned planning.
Training Agents to Self-Report Misbehavior
Authors:
Bruce Lee, John Chen
Date:
February 25, 2026
Citations:
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
Authors:
Leon Staufer, Mick Yang
Date:
February 19, 2026
Citations:
0
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.