Gabriel works with RAND on hands-on projects to build and test prototypes of secure compute infrastructure. He focuses on how to secure the most sensitive AI data centers against the most sophisticated current and future threats. Gabriel has also worked on hardware-enabled governance mechanisms (HEMs, at the intersection of GPU export control and hardware security) and on technical verification of agreements on the development and use of AI systems. He holds a master's degree in computer science and is pursuing a PhD in AI.
Kyle works on model welfare at Anthropic. He previously co-founded Eleos AI Research, Telis Bioscience, and Alvea.
Julian Stastny is a Member of Technical Staff at Redwood Research. He has a Master's in ML from the University of Tübingen, and was previously a researcher at the Center on Long-Term Risk.
Alex is a researcher at Anthropic. He is interested in developing principled methods to induce safety-relevant structure in models. Examples include gradient routing to localize learning updates in models and distillation for robust unlearning.
Previously, Alex conducted applied research in reinforcement learning at Riot Games AI and Amazon. He earned a PhD in Statistics from North Carolina State University, where he was advised by Eric Laber.
I research AI safety and alignment. Most recently, I was a research scientist at Google DeepMind. I completed my PhD at UC Berkeley's Center for Human-Compatible AI, advised by Stuart Russell. I previously cofounded FAR.AI, a 501(c)3 research nonprofit that incubates and accelerates beneficial AI research agendas.
I develop AI alignment frameworks, stress-test their limits, and turn insights into methodology adopted across the field. I have established that chain-of-thought monitoring is a substantial defense when reasoning is necessary for misalignment, designed practical metrics to preserve monitorability during model development, shown that obfuscated activations can bypass latent-space defenses, and developed StrongREJECT, a jailbreak benchmark now used by OpenAI, US/UK AISI, Amazon, and others.
Krishnamurthy (Dj) Dvijotham is a senior staff research scientist at Google DeepMind.where he leads efforts on the development of secure and trustworthy AI agents. He previously founded the AI security research team at ServiceNow Research and co-founded the robust and verified AI team at DeepMind. His past research has received best paper awards at many leading AI conferences, including most recently at ICML and CVPR 2024. His research led to the framework used for AI security testing at ServiceNow and has been deployed in several Google products, including the Android Play Store, YouTube and Gemini.
I lead the interpretability team at OpenAI. I am most interested in simple, practical interpretability approaches that are targeted at making models safer. In a previous life, I worked as a neuroscientist.
I work at the UK AI Security Institute. In the past, I’ve done research in high-performance computing, language model pretraining, interpretability, and hardware enabled governance.
Lee Sharkey is a Principle Investigator at Goodfire. His team has focused on improved interpretability methods, including parameter decomposition methods such as Attribution-based Parameter Decomposition and Stochastic Parameter Decomposition.
Previously, Lee was Chief Strategy Officer and cofounder of Apollo Research, and a Research Engineer at Conjecture, where he worked on sparse autorencoders as a solution to representational superposition. Lee’s past research includes “Goal Misgeneralization in Deep Reinforcement Learning” and “Circumventing interpretability: How to defeat mind-readers.”
Cody Rushing is a Member of Technical Staff at Redwood Research. He studied CS at UT Austin before attending MATS in 2023.
Micah Carroll is Member of Technical Staff on OpenAI's safety team interested in AI deception, scalable oversight, and monitorability. Micah is on leave from a PhD at UC Berkeley, where he focused on AI Alignment with changing and influenceable humans. In particular, he worked on AI manipulation emergent from RL training and on the effects of algorithmic choices in recommender systems.
Robert is a research scientist and the acting lead of the alignment red-teaming sub-team at UK AISI. This team's focus is on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. Before that, he's most recently worked on misuse research, focusing on evaluations of safeguards against misuse and mitigations for misuse risk, particularly in open-weight systems. He graduated from his PhD from University College London on generalisation in LLM fine-tuning and RL agents in January 2025.
Alex Souly is a researcher on the Red Team at the UK AI Security Institute, where she works on the safety and security of frontier LLMs. She has contributed to pre-deployment evaluations and red-teaming of misuse safeguards and alignment (see Anthropic and OpenAI blogpost), and worked on open source evals like StrongReject and AgentHarm. Previously, she studied Maths at Cambridge and Machine Learning at UCL as part of UCL Dark lab, interned at CHAI, and in another life worked as a SWE at Microsoft.
Fin Moorhouse is a researcher at Forethought. Previously he was a researcher at the Future of Humanity Institute and Longview Philanthropy, and studied philosophy at Cambridge.
Adrià is an independent researcher focused on open-source self-alignment and self-exploration, and reproducible inference. Previously, he was a Research Scientist at FAR AI, where he reverse-engineered a recurrent neural network that plans. His previous interpretability work includes measuring progress in interpretability with InterpBench, Automatic Circuit Discovery and Causal Scrubbing. He previously worked at Redwood Research on neural network interpretability. He holds a PhD from the University of Cambridge, where he worked on Bayesian neural networks.
Romeo is working on forecasting detailed AI scenarios and developing policy recommendations with the AI Futures Project. He focuses primarily on compute and security forecasting. Previously he was an IAPS Policy Fellow and graduated with a concurrent masters in Computer Science at Harvard with a systems and hardware focus.
I am a philosopher of mind and a researcher at Eleos AI, where I work on AI consciousness, agency and welfare. Before joining Eleos, I worked at the Future of Humanity Institute and Global Priorities Institute in Oxford. I'm interested in projects including purely philosophical work on the grounds of moral status; research drawing on cognitive science to gain a mechanistic understanding of sentience and agency; and empirical studies that can shed light on welfare-relevant features in AI.
The MATS Program is a 12-week research fellowship designed to train and support emerging researchers working on AI alignment, interpretability, governance, and safety. Fellows collaborate with world-class mentors, receive dedicated research management support, and join a vibrant community in Berkeley focused on advancing safe and reliable AI. The program provides the structure, resources, and mentorship needed to produce impactful research and launch long-term careers in AI safety.
MATS mentors are leading researchers from a broad range of AI safety, alignment, governance, interpretability, and security domains. They include academics, industry researchers, and independent experts who guide scholars through research projects, provide feedback, and help shape each scholar’s growth as a researcher. The mentors represent expertise in areas such as:
Key dates
Application:
The main program will then run from early June to late August, with the extension phase for accepted fellows beginning in September.
MATS accepts applicants from diverse academic and professional backgrounds ranging from machine learning, mathematics, and computer science to policy, economics, physics, and cognitive science. The primary requirements are strong motivation to contribute to AI safety and evidence of technical aptitude or research potential. Prior AI safety experience is helpful but not required.
Applicants submit a general application, applying to various tracks (technical governance, empirical, policy & strategy, theory, and compute governance) and streams within those tracks.
After a centralized review period, applicants who are advanced will then undergo additional evaluations depending on the preferences of the streams they've applied to before doing final interviews and receiving offers.
For more information on how to get into MATS, please look at this page.