MATS mentors are advancing the frontiers of AI alignment, transparency, and security

Keri Warr
Anthropic
,
Security Engineering Manager

Technical Lead of the Infrastructure Security Engineering Team @ Anthropic. Implementing SL4/5 and searching for differentially defense-favored security tools.

Focus:
Empirical
Security, Compute and Hardware
Tyler Tracy
Redwood Research
,
Member of Technical Staff
Focus:
Empirical
Control, Model Organisms, Scheming and Deception, Strategy and Forecasting
Programs:

Daniel is a professor of computer science at UIUC, where he studies the progress of AI, with a particular focus on dangerous capabilities of AI agents. His work includes:

- CVE-Bench, an award winning benchmark (SafeBench award, ICML spotlight) that is used by frontier labs and governments to measure AI agents' ability to find and exploit real-world vulnerabilities. 

- Agent Benchmark Checklist, an award winning work (Berkeley AI summit, 1st place Benchmarks & Evaluations track) that highlights major issues in existing benchmarks.

- InjecAgent, one of the first AI agent safety benchmarks, used by governments and major labs.

Focus:
Empirical
Biorisk, Security, Dangerous Capability Evals
Sara Price
Anthropic
,
Member of Technical Staff
Focus:
Empirical
Control, Model Organisms, Red-Teaming, Scheming and Deception
Programs:
Roger Grosse
Anthropic
,
Associate Professor
Focus:
Empirical
Interpretability
Programs:
Xander Davies
UK AISI
,
Safeguards Team Lead

Xander Davies is a Member of the Technical Staff at the UK AI Security Institute, where he leads the Red Teaming group, which uses adversarial ML techniques to understand, attack, and mitigate frontier AI safeguards. He is also a PhD student at the University of Oxford, supervised by Dr. Yarin Gal. He previously studied computer science at Harvard, where he founded and led the Harvard AI Safety Team.

Focus:
Empirical
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards
Programs:
Maksym Andriushchenko
ELLIS Institute Tübingen
,
Principal Investigator (AI Safety and Alignment Group)

I am a principal investigator at the ELLIS Institute Tübingen and the Max Planck Institute for Intelligent Systems, where I lead the AI Safety and Alignment group. I also serve as chapter lead for the new edition of the International AI Safety Report chaired by Prof. Yoshua Bengio. I have worked on AI safety with leading organizations in the field (OpenAI, Anthropic, UK AI Safety Institute, Center for AI Safety, Gray Swan AI). I obtained my PhD in machine learning from EPFL in 2024 advised by Prof. Nicolas Flammarion. My PhD thesis was awarded the Patrick Denantes Memorial Prize for the best thesis in the CS department of EPFL and was supported by the Google and Open Phil AI PhD Fellowships.

Focus:
Empirical
Dangerous Capability Evals, Agent Foundations, Adversarial Robustness, Monitoring, Scalable Oversight, Scheming and Deception
Programs:
Neev Parikh
METR
,
Member of Technical Staff

I like to make computers do interesting things, deeply understand concepts and build interesting, useful tools. I’m currently thinking about AI alignment, control, and evaluations, and work with frontier models at METR. 

Recent work I've done involves MALT, training models to fool monitors in QA settings and RE-Bench.

I've previously worked at Stripe and CSM, and did a concurrent BSc/MSc in Computer Science at Brown. 

Focus:
Empirical
Dangerous Capability Evals, Red-Teaming, Model Organisms, Control, Monitoring
Programs:
Sarah Schwettmann
Transluce
,
Co-Founder, Chief Scientist

I’m a Research Scientist in MIT CSAIL with the MIT-IBM Watson AI Lab. I did my PhD in Brain and Cognitive Sciences at MIT, as an NSF Fellow working with Josh Tenenbaum and Antonio Torralba. My work investigates representations underlying intelligence in artificial (and previously, biological) neural networks.

Focus:
Empirical
Interpretability, Monitoring, Dangerous Capability Evals
Jacob Hilton
ARC
,
Researcher, Executive Director

Jacob Hilton is a researcher and the executive director at the Alignment Research Center (ARC), a nonprofit working on the theoretical foundations of mechanistic interpretability. He previously worked at OpenAI on reinforcement learning from human feedback, scaling laws and interpretability. His background is in pure mathematics, and he holds a PhD in set theory from the University of Leeds, UK.

Focus:
Theory
Interpretability
Programs:

Eli is working on AI scenario forecasting with the AI Futures Project, where he co-authored AI 2027. He advises Sage, an organization he cofounded that works on AI Digest (interactive AI explainers) and forecasting tools. He previously worked on the AI-powered research assistant Elicit.

Focus:
Policy and Strategy
Strategy and Forecasting, Policy and Governance
Michael Chen
METR
,
Member of Policy Staff

Michael Chen works on AI policy at METR and is an incoming part-time PhD student at Oxford in technical AI governance. Michael previously worked as a software engineer at Stripe. METR's policy team has assisted companies like Google DeepMind, Amazon, and Anthropic with developing their frontier safety policies – voluntary commitments to evaluate and mitigate severe AI risks. Besides corporate advising, Michael has provided feedback on U.S. state bills and the EU AI Act GPAI Code of Practice.

Focus:
Technical Governance
Dangerous Capability Evals, Policy and Governance
Tomek Korbak
OpenAI
,
Member of Technical Staff

I’m a Member of Technical Staff at OpenAI working on monitoring LLM agents for misalignment. Previously, I worked on AI control and safety cases at the UK AI Security Institute and on honesty post-training at Anthropic. Before that, I did a PhD at the University of Sussex with Chris Buckley and Anil Seth focusing on RL from human feedback (RLHF) and spent time as a visiting researcher at NYU working with Ethan Perez, Sam Bowman and Kyunghyun Cho.

Focus:
Empirical
Control, Monitoring, Dangerous Capability Evals
Programs:
Fynn Heide
Safe AI Forum
,
Executive Director

Fynn Heide is the Executive Director of Safe AI Forum. He studied at the University of Warwick and has done research on China, and AI Governance.

Focus:
Policy and Strategy
Policy and Governance
Programs:

Stephen (“Cas”) Casper is a final year Ph.D student at MIT in the Algorithmic Alignment Group  advised by Dylan Hadfield-Menell. His work focuses on AI safeguards and technical governance. His research has been featured at NeurIPS, AAAI, Nature, FAccT, EMNLP, SaTML, TMLR, IRAIS, several course curricula, a number of workshops, and over 20 news articles and newsletters. He is also a writer for the International AI Safety Report and the Singapore Consensus. In addition to MATS, he also mentors for ERA and GovAI. In the past, he has worked closely with over 30 mentees on various safety-related research projects.

Focus:
Technical Governance
Adversarial Robustness, Policy and Governance, Red-Teaming, Safeguards

I am a research scientist on the AGI Safety & Alignment team at Google DeepMind. I am currently focusing on deceptive alignment and AI control (recent work: https://arxiv.org/abs/2505.01420), particularly scheming propensity evaluations and honeypots. My past research includes power-seeking incentives, specification gaming, and avoiding side effects. 

Focus:
Empirical
Scheming and Deception, Dangerous Capability Evals, Control, Red-Teaming

Eric Neyman is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. Before joining ARC, he was a PhD student at Columbia University, where he researched algorithmic Bayesian epistemology.

Focus:
Theory
Interpretability
Programs:
Yafah Edelman
Epoch AI
,
Head of Data & Trends

Yafah Edelman is the head of the data team at Epoch AI. She researches the inputs that allow AI to scale, as well its impacts.

Focus:
Compute Infrastructure
Compute and Hardware
Programs:
He He
New York University
,
Associate Professor

He He is an associate professor at New York University. She is interested in how large language models work and potential risks of this technology.

Focus:
Empirical
Monitoring, Dangerous Capability Evals, Scalable Oversight, Safeguards
Programs:
Mauricio Baker
RAND
,
Technical AI Policy Research Scientist, PhD student

Mauricio researches AI policy at RAND. His work has focused on AI hardware governance, especially export controls and verification of international agreements on AI. He’s more broadly interested in technical AI governance, and in studying policy options the field might be overlooking. Previously, Mauricio contracted with OpenAI and did a masters in Computer Science at Stanford University.

Focus:
Technical Governance
Compute and Hardware, Policy and Governance

Frequently asked questions

What is the MATS Program?
Who are the MATS Mentors?
What are the key dates of the MATS Program?
Who is eligible to apply?
How does the application and mentor selection process work?