Launch your career in AI alignment & security

The MATS Program is an independent research and educational seminar program that connects talented researchers with top mentors in the fields of AI alignment, transparency, and security. The program runs for 12 weeks with in-person cohorts in Berkeley and London, where MATS fellows conduct research while attending talks, workshops, and networking events with other members of the AI research community.

Robert Krzyzanowski
Poseidon Research

Before MATS, I had a strong interest in alignment generally but few skillsets relevant to the frontier of research and little idea of how to get started. Directly thanks to MATS, I achieved: (1) a relatively complete understanding of the structure of the most important questions and associated communities in in the AI safety space, (2) legible and significant research outputs that gave me the confidence to continue switching into a full-time career in the space, and (3) access to a broad base of present and future collaborators with a very wide range of perspectives. On this third point, the talent exhibited at MATS is fearsome and highly motivated to solve the problems. It would not be at all surprising to me if when the dust settles and the grand project of alignment reaches eventual fruition, it becomes apparent that over a double digit percentage of the credit attribution to the key problems and solutions belongs to MATS alumni.

I am an independent AI safety researcher currently focused on mechanistic interpretability and training process transparency.

Thomas Larsen
AI Futures Project

MATS helped me upskill in alignment at a >3x rate relative to the counterfactual, which was independently learning infra-bayesianism because I liked math and I didn't have an inside view on what parts of alignment was important. MATS caused me to develop a much deeper view of the alignment problem and afterwards I felt like I was able to focus on the most important parts of the problem and biggest sources of confusion within myself.

Thomas took part in the Summer 2022 Cohort with John Wentworth and the Winter 2023 Cohort with Nate Soares. During this time, he wrote a detailed overview of AI Safety approaches. He continued his SERI MATS work at MIRI, before leaving to found the Center for AI Policy, an AI safety advocacy organization. He is currently a researcher at the AI Futures Project and a guest fund manager at the LTFF.

Nina Panickssery
Anthropic

Participating in MATS was a great way to rapidly upskill in AI safety research, learn about the field, and meet other researchers/collaborators. The environment/office space was also very thoughtfully designed to enable productivity.

Nina participated in the MATS summer 2023 cohort under the mentorship of Evan Hubinger. As a result of MATS, she published the paper Steering Llama 2 via Contrastive Activation Addition which won an Outstanding Paper Award at ACL 2024. After MATS, Nina joined Anthropic as a research scientist, and has mentored a number of SPAR and MATS cohorts working on LLM alignment projects.

Jesse Hoogland
Timaeus

There's life pre-MATS and life post-MATS. It was the inflection point that set me up to become a technical AI safety researcher. I don't think there are other opportunities as good at getting early-career people integrated into AI safety. The in-person program was the most impactful and high-energy two months I've ever been a part of, and it's my number one recommendation to people considering work on AI safety.

Jesse Hoogland is the executive director of Timaeus, an AI safety research organization studying developmental interpretability and singular learning theory. He was a MATS scholar during MATS 3.0 and 3.1 in Evan Hubinger's Deceptive AI stream. During this period, he became interested in understanding how AI systems develop during training. This led to him helping to organize the SLT and Alignment conference and the DevInterp conference, which resulted in the developmental interpretability research agenda.

Marius Hobbhahn
Apollo Research

Apollo almost certainly would not have happened without MATS. One of the core reasons why starting an organization is hard is because the founding members need to know and trust each other. It is often hard to find people with similar agendas that you also personally enjoy working with in a systematic manner. MATS implicitly created such an environment because it enabled many of us to understand what everyone else is working on, get to know them personally and see their research progress without having to commit to anything in particular.

Marius took part in MATS Winter 2022/23 Cohort under the mentorship of Evan Hubinger (Anthropic). He published multiple pieces on mechanistic interpretability on LessWrong including work on maximum data dimension and double descent. He is currently the CEO and Director of Apollo Research, a new London-based technical alignment organization. Previously, he did a Ph.D. in Machine Learning and conducted independent alignment research. Read more on his website.

Quentin Feuillade-Montixi
METR

MATS was a life changing experience. I met and got mentored by amazing people, and I learned so much in such a small amount of time. Looking back at me before this program, I don't think I could even recognize myself 8 month ago. Even though I have no academic background, I felt listened, empowered and supported in order to tackle the biggest challenges that I (and possibly we) have ever faced.

Currently, I'm launching WeaveMind, building AI-powered knowledge management systems that transform scattered notes and conversations into active thinking partners. The goal is to solve the bottlenecks that slow down knowledge work: forgetting insights, struggling to find information, manually reconstructing context for AI conversations.

Kay Kozaronek
AI Safety Connect (AISC)

Working in a team environment, particularly one as stimulating as MATS, was a transformative experience. It not only refined my research skills but also instilled a newfound entrepreneurial spirit in me. The program encouraged me to think beyond the conventional, to innovate, and to take risks. Additionally, the array of skills I acquired during my time at MATS was vast. I delved deep into research engineering, honed my science communication abilities, and even tapped into the art of fundraising. These skills, I believe, are indispensable and have equipped me to navigate the ever-evolving world of research with confidence. In conclusion, I wholeheartedly endorse the MATS program. To anyone considering embarking on this journey, you are not only signing up for an unparalleled research experience but also a lifetime of growth, learning, and camaraderie.

I'm working on AI Safety Connect, a new organization convening diplomatic and AI Safety stakeholders at the highest level - think UN, India Impact Summit etc. We are also seeding a few other projects, like engaging the UAE in AI Safety and helping prevent critical coordination failures among frontier labs.

Johannes Treutlein
Anthropic

MATS helped me get deeper into AI safety research by motivating me to get up to speed with current research and giving me access to mentorship from an expert in AI safety, as well as a smart and talented cohort and a large network of researchers. It also provided infrastructure such as office space in Berkeley and a generous stipend. SERI MATS worked as a matchmaker between Evan Hubinger and me and thus helped me get involved in his projects, which would have been harder to do otherwise. I feel like I have developed faster as a researcher since doing MATS.

Johannes completed the MATS Summer 2022 Cohort under the mentorship of Evan Hubinger (then a Research Fellow at MIRI). As a result of MATS, Johannes co-authored the paper Conditioning Predictive Models: Risks and Strategies with Evan as a lead author. He also published a follow-up paper on Incentivizing honest performative predictions with proper scoring rules at the UAI 2023 conference. After MATS, Johannes started a PhD in Computer Science at CHAI. Since 2024, he Johannes has been working at Anthropic on alignment stress-testing.

Cody Rushing
Redwood Research

I endorse MATS strongly! MATS is my top recommendation for people looking to get into technical AI Safety research. The mentorship and community I received through MATS enabled me to quickly grow as a researcher and gave me the space to pursue useful research directions.

Cody Rushing is an Undergraduate CS Major at UT Austin. He is currently working with Buck Shlegeris and Redwood Research on AI Control. He is continuing this research into the fall.

https://starship006.github.io/

Jay Bailey
Arcadia Impact

MATS was an excellent environment to get productive work done and a fantastic resource to improve my future impact in AI alignment. I made connections, learned a great deal about my mentor's subfield and alignment in general, and was fired up to keep working when I got back to Australia. Since MATS I've been funded for a project with a collaborator I met at MATS, and gotten significantly further in the hiring process for orgs than before.

Previous UK AISI employee experienced in frontier LLM evaluation, now looking to contribute to technical AI safety and reducing extinction risks from misaligned AGI systems.

Dan Valentine
Anthropic

Ethan spent a lot of time discussing our research with us and gave great advice on direction. He unblocked us in various ways, such as getting access to more models or to lots of compute budget. He connected us with lots of great people, some of whom became collaborators. And he was a very inspiring mentor to work with.

Dan Valentine is a Member of Technical Staff at Anthropic, an AI safety and research company. His work is primarily focused on AI safety and alignment research, including scalable oversight methods and understanding how AI models interact with data and prompts.

MATS alumni are shaping the future of AI

Since late 2021, over 446 researchers have trained through MATS, producing 150+ research papers, joining leading AI labs, and founding new organizations driving progress in AI alignment, transparency, and security.

80% of alumni are now working in AI alignment, transparency, and security.

MATS alumni have been hired by leading organizations like Anthropic, Google DeepMind, OpenAI, Meta AI, UK AISI, Redwood Research, METR, RAND CAST, Coefficient Giving, ARC, FAR.AI, Apollo Research, Truthful AI, Goodfire, LawZero, MIRI, CAIF, Center on Long-Term Risk, Beneficial AI Foundation, SaferAI, Haize Labs, EleutherAI, Harmony Intelligence, Conjecture, and joined academic research groups like UC Berkeley CHAI, NYU ARG, NU Bau Lab, Mila, and MIT Tegmark Group.

MATS is designed to empower researchers so they can focus on impact

MATS provides mentorship, research funding, housing, and community so researchers can devote their energy to solving the world’s most important problem.

 
Mentorship

Fellows receive guidance from top researchers in AI alignment, governance, and security.

 
Research support

Fellows work with a dedicated research manager who helps scope projects, maintain progress, and remove blockers.

 
Educational events

Fellows participate in seminars, workshops, and guest lectures led by experts across the alignment community.

 
Stipend

Fellows receive a $15k stipend from AI Safety Support to cover living expenses.

 
Compute budget

Fellows are provided with $12k of compute resources to support experiments and evaluations.

 
Workspace

Fellows have access to office space in Berkeley and London, and collaborate daily with fellow researchers.

 
Meals & housing

Fellows receive catered lunches and dinners and are provided with private housing for the full duration of the program.

 
Community

Fellows gain connections and networking opportunities across the broader AI alignment ecosystem.

 
Extension pathway

Fellows may be invited to join the London-based extension program for an additional 6–12 months of research.

Research produced by MATS fellows

The body of research produced by MATS fellows spans the full spectrum of advancing AI safety, resilience, and understanding. Scholars investigate the inner workings of modern AI systems through mechanistic interpretability, sparse feature analysis, studies of latent representations and other techniques.

Featured research

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding. It asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

Authors:

Fellow: Daniel Tan

Jan Betley, Daniel Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans

Date:

Feb 24, 2025

Sparse Autoencoders Find Highly Interpretable Features in Language Models

One of the roadblocks to a better understanding of neural networks' internals is polysemanticity, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons. Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. These autoencoders learn sets of sparsely activating features that are more interpretable and monosemantic than directions identified by alternative approaches, where interpretability is measured by automated methods. Moreover, we show that with our learned set of features, we can pinpoint the features that are causally responsible for counterfactual behaviour on the indirect object identification task \citep{wang2022interpretability} to a finer degree than previous decompositions. This work indicates that it is possible to resolve superposition in language models using a scalable, unsupervised method. Our method may serve as a foundation for future mechanistic interpretability work, which we hope will enable greater model transparency and steerability.

Authors:

Fellow: Hoagy Cunningham

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

Date:

Sep 15, 2023

AI agents find $4.6M in blockchain smart contract exploits

AI models are increasingly good at cyber tasks, as we've written about before. But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents' ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench)—a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoff (March 2025), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth $4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.

Authors:

Fellow: Winnie Xiao

Winnie Xiao, Cole Killian, Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng

Date:

Dec 1, 2025

MATS fellows' core research focuses on these tracks

Hands-on research using machine learning experiments to understand and improve model safety — including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness.

Research on policy frameworks and strategic dynamics — including international coordination, institutional design, AI forecasting, and regulatory proposals.

Foundational research on the mathematical and philosophical principles underlying agency, alignment, and safe reasoning in advanced AI systems.

Research translating governance goals into technical mechanisms — including compliance protocols, evaluation standards, and enforcement tools.

Research on hardware security and infrastructure-level mechanisms for monitoring and securing AI development and deployment — including side-channel analysis, cluster security, and physical-layer verification.

Our mission

MATS aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence (AI). We believe that ambitious researchers from a variety of backgrounds have the potential to meaningfully contribute to the field of alignment research. We aim to provide the training, logistics, and community necessary to aid this transition. We also connect our fellows with financial support to ensure their stability and security.

MATS Research is an independent 501(c)(3) public charity (EIN: 99-0648563).

Join the MATS team

MATS is an independent research and educational seminar program that connects talented researchers with top mentors in the fields of AI alignment, interpretability, and governance. The main goal of MATS is to help scholars develop as AI alignment researchers.

Frequently asked questions

What is the MATS Program?
Who are the MATS Mentors?
What are the key dates of the MATS Program?
Who is eligible to apply?
How does the application and mentor selection process work?