MATS mentors are advancing the frontiers of AI alignment, transparency, and security

Nicholas is a researcher working at the intersection of machine learning and computer security. Currently he works at Anthropic studying what bad things you could do with, or do to, language models; he likes to break things.

Focus:
Empirical
Control, Model Organisms, Red-Teaming, Scheming & Deception
Michael Winer (Mike Winer)
ARC
,
Research collaborator

Mike Winer is a researcher at the Alignment Research Center (ARC), where he studies how mechanistic estimates can beat black-box techniques in toy setups. His background is in statistical physics, where he studies how many objects obeying simple rules can exhibit complex behaviors like magnetism, glassiness, or scoring 87% on GPQA.

Focus:
Theory
Interpretability
Programs:
Samuel Marks
Anthropic
,
Member of Technical Staff

Sam leads the Cognitive Oversight subteam of Anthropic's Alignment Science team. Their goal is to be able to oversee AI systems not based on whether they have good input/output behavior, but based on whether there's anything suspicious about the cognitive processes underlying those behaviors. For example, one in-scope problem is "detecting when language models are lying, including in cases where it's difficult to tell based solely on input/output". His team is interested in both white-box techniques (e.g. interpretability-based techniques) and black-box techniques (e.g. finding good ways to interrogate models about their thought processes and motivations). For more flavor on this research direction, see his post here https://www.lesswrong.com/posts/s7uD3tzHMvD868ehr/discriminating-behaviorally-identical-classifiers-a-model

Focus:
Empirical
Control, Model Organisms, Red-Teaming, Scheming & Deception
Julian Stastny
Redwood Research
,
Member of Technical Staff

Julian Stastny is a Member of Technical Staff at Redwood Research. He has a Master's in ML from the University of Tübingen, and was previously a researcher at the Center on Long-Term Risk.

Focus:
Empirical
Control, Model Organisms, Scheming & Deception, Strategy & Forecasting
Programs:

Fin Moorhouse is a researcher at Forethought. Previously he was a researcher at the Future of Humanity Institute and Longview Philanthropy, and studied philosophy at Cambridge.

Focus:
Policy & Strategy
AI Welfare, Strategy & Forecasting, Policy & Governance
Programs:

Mary is a research scientist on the Frontier Safety Loss of Control team at DeepMind, where she works on AGI control (security and monitoring). Her role involves helping make sure that potentially misaligned, internally deployed models cannot cause severe harm or sabotage, even if they wanted to. Previously, she has worked on dangerous capability evaluations for scheming precursor capabilities (stealth and situational awareness) as well catastrophic misuse capabilities.

Focus:
Empirical
Control, Scheming & Deception, Dangerous Capability Evals, Model Organisms, Monitoring

I am a Senior Research Fellow at the Center for the Governance of AI, leading a work stream that investigates national security threats from advanced AI systems. I am also a collaborator at METR, where I help improve the rigor of system cards and evals, and a Senior Advisor at the Forecasting Research Institute.

I am interested in mentoring projects that create rigorous threat models of near-term AI misuse, especially within biosecurity. Given that this work can include sensitive topics, the final output might look like writing memos and briefings for decision-makers instead of academic publications.

I am also interested in projects that try to strengthen the science and transparency of dangerous capability evaluations reporting. This includes creating standards and checklists, writing peer reviews of model cards, and designing randomized control trials that can push the current frontier. 

Focus:
Technical Governance
Biorisk, Security, Safeguards
Jean-Pierre Falet
LawZero
,
Machine Learning Research Scientist

Jean-Pierre is a machine learning research scientist at LawZero, focused on designing model-based AI systems with quantitative safety guarantees. His primary interests are in probabilistic inference in graphical models, and he draws inspiration from his multidisciplinary background in neurology and neuroscience, which informs his understanding of human cognition. Jean-Pierre studied at McGill University, obtaining a medical degree in 2017, completing a neurology residency in 2022, and earning a master's degree in neuroscience in 2023. During his master’s, he developed causal machine learning methods for precision medicine. Concurrently with his work at LawZero, Jean-Pierre is completing a PhD in computer science at Mila and Université de Montréal, supervised by Yoshua Bengio. In addition to contributing to the foundations of guaranteed-safe AI, Jean-Pierre is passionate about translating advances in AI into clinically meaningful, safety-critical applications.

Focus:
Empirical
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight
Programs:

Romeo is working on forecasting detailed AI scenarios and developing policy recommendations with the AI Futures Project. He focuses primarily on compute and security forecasting. Previously he was an IAPS Policy Fellow and graduated with a concurrent masters in Computer Science at Harvard with a systems and hardware focus.

Focus:
Policy & Strategy
Strategy & Forecasting, Policy & Governance
Programs:

Eli is working on AI scenario forecasting with the AI Futures Project, where he co-authored AI 2027. He advises Sage, an organization he cofounded that works on AI Digest (interactive AI explainers) and forecasting tools. He previously worked on the AI-powered research assistant Elicit.

Focus:
Policy & Strategy
Strategy & Forecasting, Policy & Governance
Jacob Hilton
ARC
,
Researcher, Executive Director

Jacob Hilton is a researcher and the executive director at the Alignment Research Center (ARC), a nonprofit working on the theoretical foundations of mechanistic interpretability. He previously worked at OpenAI on reinforcement learning from human feedback, scaling laws and interpretability. His background is in pure mathematics, and he holds a PhD in set theory from the University of Leeds, UK.

Focus:
Theory
Interpretability
Programs:

Alexis is the co-founder and CEO of Asymmetric Security. He was previously an AI security fellow at RAND and part of the founding team of GovAI. 

Focus:
Empirical
Security, Dangerous Capability Evals
Programs:

Stephen (“Cas”) Casper is a final year Ph.D student at MIT in the Algorithmic Alignment Group  advised by Dylan Hadfield-Menell. His work focuses on AI safeguards and technical governance. His research has been featured at NeurIPS, AAAI, Nature, FAccT, EMNLP, SaTML, TMLR, IRAIS, several course curricula, a number of workshops, and over 20 news articles and newsletters. He is also a writer for the International AI Safety Report and the Singapore Consensus. In addition to MATS, he also mentors for ERA and GovAI. In the past, he has worked closely with over 30 mentees on various safety-related research projects.

Focus:
Technical Governance
Adversarial Robustness, Policy & Governance, Red-Teaming, Safeguards
He He
New York University
,
Associate Professor

He He is an associate professor at New York University. She is interested in how large language models work and potential risks of this technology.

Focus:
Empirical
Monitoring, Dangerous Capability Evals, Scalable Oversight, Safeguards
Programs:
Yafah Edelman
Epoch AI
,
Head of Data & Trends

Yafah Edelman is the head of the data team at Epoch AI. She researches the inputs that allow AI to scale, as well its impacts.

Focus:
Compute Infrastructure
Compute & Hardware
Programs:

Alan Cooney leads the Autonomous Systems workstream within the UK's AI Safety Institute. His team is responsible for assessing the capabilities and risks of Frontier AI systems released by AI labs such as OpenAI, Google and Anthropic. Prior to working in AI safety, he was an investment consultant and start-up founder, with his company Skyhook being acquired in 2023. He also completed Stanford’s Machine Learning and Alignment Theory Scholars Programme, where he was supervised by Google DeepMind researcher Neel Nanda.

Focus:
Empirical
Control, Monitoring
Programs:
Adam Kaufman
Redwood Research
,
Member of Technical Staff
Focus:
Empirical
Control, Model Organisms, Scheming & Deception, Strategy & Forecasting
Programs:
Maksym Andriushchenko
ELLIS Institute Tübingen
,
Principal Investigator (AI Safety and Alignment Group)

I am a principal investigator at the ELLIS Institute Tübingen and the Max Planck Institute for Intelligent Systems, where I lead the AI Safety and Alignment group. I also serve as chapter lead for the new edition of the International AI Safety Report chaired by Prof. Yoshua Bengio. I have worked on AI safety with leading organizations in the field (OpenAI, Anthropic, UK AI Safety Institute, Center for AI Safety, Gray Swan AI). I obtained my PhD in machine learning from EPFL in 2024 advised by Prof. Nicolas Flammarion. My PhD thesis was awarded the Patrick Denantes Memorial Prize for the best thesis in the CS department of EPFL and was supported by the Google and Open Phil AI PhD Fellowships.

Focus:
Empirical
Dangerous Capability Evals, Agent Foundations, Adversarial Robustness, Monitoring, Scalable Oversight, Scheming & Deception
Programs:
Matthew Gentzel
Longview Philanthropy
,
Nuclear Weapons Policy Program Officer

Matthew Gentzel is a Nuclear Weapons Policy Program Officer at Longview Philanthropy where he works on grantmaking and priorities research related to reducing the risk of large-scale nuclear war. Roughly half of his grantmaking budget concentrates on AI and emerging tech-related nuclear risk issues, where he investigates risks and opportunities related to AI-enabled targeting, information manipulation, and how perceptions of future AI capability impact escalation control in the near-term.

His prior work spanned emerging technology threat and policy assessment, with a particular focus on how advancements in AI may shape the future of influence operations, nuclear strategy, and cyber attacks. He has worked as a policy researcher with OpenAI, as an analyst in the US Department of Defense’s Innovation Steering Group, and as a director of research and analysis at the US National Security Commission on Artificial Intelligence. 

Mr. Gentzel holds an MA in strategic studies and international economics from Johns Hopkins School of Advanced International Studies, a BS in fire protection engineering from the University of Maryland College Park.

Focus:
Policy & Strategy
Policy & Governance, Strategy & Forecasting
Trenton Bricken
Anthropic
,
Member of Technical Staff

Trenton Bricken is a Member of Technical Staff at Anthropic, working on the Alignment Science team. He holds a PhD in Systems Biology from Harvard, with a thesis on “Sparse Representations in Biological and Artificial Neural Networks.” 

Focus:
Empirical
Control, Model Organisms, Red-Teaming, Scheming & Deception
Programs:

Frequently asked questions

What is the MATS Program?
Who are the MATS Mentors?
What are the key dates of the MATS Program?
Who is eligible to apply?
How does the application and mentor selection process work?