The MATS Program is an independent research and educational seminar program that connects talented researchers with top mentors in the fields of AI alignment, transparency, and security. The program runs for 12 weeks with in-person cohorts in Berkeley and London, where MATS fellows conduct research while attending talks, workshops, and networking events with other members of the AI research community.

Before MATS, I had a strong interest in alignment generally but few skillsets relevant to the frontier of research and little idea of how to get started. Directly thanks to MATS, I achieved: (1) a relatively complete understanding of the structure of the most important questions and associated communities in in the AI safety space, (2) legible and significant research outputs that gave me the confidence to continue switching into a full-time career in the space, and (3) access to a broad base of present and future collaborators with a very wide range of perspectives. On this third point, the talent exhibited at MATS is fearsome and highly motivated to solve the problems. It would not be at all surprising to me if when the dust settles and the grand project of alignment reaches eventual fruition, it becomes apparent that over a double digit percentage of the credit attribution to the key problems and solutions belongs to MATS alumni.
I am an independent AI safety researcher currently focused on mechanistic interpretability and training process transparency.

MATS helped me upskill in alignment at a >3x rate relative to the counterfactual, which was independently learning infra-bayesianism because I liked math and I didn't have an inside view on what parts of alignment was important. MATS caused me to develop a much deeper view of the alignment problem and afterwards I felt like I was able to focus on the most important parts of the problem and biggest sources of confusion within myself.
Thomas took part in the Summer 2022 Cohort with John Wentworth and the Winter 2023 Cohort with Nate Soares. During this time, he wrote a detailed overview of AI Safety approaches. He continued his SERI MATS work at MIRI, before leaving to found the Center for AI Policy, an AI safety advocacy organization. He is currently a researcher at the AI Futures Project and a guest fund manager at the LTFF.


Participating in MATS was a great way to rapidly upskill in AI safety research, learn about the field, and meet other researchers/collaborators. The environment/office space was also very thoughtfully designed to enable productivity.
Nina participated in the MATS summer 2023 cohort under the mentorship of Evan Hubinger. As a result of MATS, she published the paper Steering Llama 2 via Contrastive Activation Addition which won an Outstanding Paper Award at ACL 2024. After MATS, Nina joined Anthropic as a research scientist, and has mentored a number of SPAR and MATS cohorts working on LLM alignment projects.

There's life pre-MATS and life post-MATS. It was the inflection point that set me up to become a technical AI safety researcher. I don't think there are other opportunities as good at getting early-career people integrated into AI safety. The in-person program was the most impactful and high-energy two months I've ever been a part of, and it's my number one recommendation to people considering work on AI safety.
Jesse Hoogland is the executive director of Timaeus, an AI safety research organization studying developmental interpretability and singular learning theory. He was a MATS scholar during MATS 3.0 and 3.1 in Evan Hubinger's Deceptive AI stream. During this period, he became interested in understanding how AI systems develop during training. This led to him helping to organize the SLT and Alignment conference and the DevInterp conference, which resulted in the developmental interpretability research agenda.

Apollo almost certainly would not have happened without MATS. One of the core reasons why starting an organization is hard is because the founding members need to know and trust each other. It is often hard to find people with similar agendas that you also personally enjoy working with in a systematic manner. MATS implicitly created such an environment because it enabled many of us to understand what everyone else is working on, get to know them personally and see their research progress without having to commit to anything in particular.
Marius took part in MATS Winter 2022/23 Cohort under the mentorship of Evan Hubinger (Anthropic). He published multiple pieces on mechanistic interpretability on LessWrong including work on maximum data dimension and double descent. He is currently the CEO and Director of Apollo Research, a new London-based technical alignment organization. Previously, he did a Ph.D. in Machine Learning and conducted independent alignment research. Read more on his website.

MATS was a life changing experience. I met and got mentored by amazing people, and I learned so much in such a small amount of time. Looking back at me before this program, I don't think I could even recognize myself 8 month ago. Even though I have no academic background, I felt listened, empowered and supported in order to tackle the biggest challenges that I (and possibly we) have ever faced.
Currently, I'm launching WeaveMind, building AI-powered knowledge management systems that transform scattered notes and conversations into active thinking partners. The goal is to solve the bottlenecks that slow down knowledge work: forgetting insights, struggling to find information, manually reconstructing context for AI conversations.

Working in a team environment, particularly one as stimulating as MATS, was a transformative experience. It not only refined my research skills but also instilled a newfound entrepreneurial spirit in me. The program encouraged me to think beyond the conventional, to innovate, and to take risks. Additionally, the array of skills I acquired during my time at MATS was vast. I delved deep into research engineering, honed my science communication abilities, and even tapped into the art of fundraising. These skills, I believe, are indispensable and have equipped me to navigate the ever-evolving world of research with confidence. In conclusion, I wholeheartedly endorse the MATS program. To anyone considering embarking on this journey, you are not only signing up for an unparalleled research experience but also a lifetime of growth, learning, and camaraderie.
I'm working on AI Safety Connect, a new organization convening diplomatic and AI Safety stakeholders at the highest level - think UN, India Impact Summit etc. We are also seeding a few other projects, like engaging the UAE in AI Safety and helping prevent critical coordination failures among frontier labs.


MATS helped me get deeper into AI safety research by motivating me to get up to speed with current research and giving me access to mentorship from an expert in AI safety, as well as a smart and talented cohort and a large network of researchers. It also provided infrastructure such as office space in Berkeley and a generous stipend. SERI MATS worked as a matchmaker between Evan Hubinger and me and thus helped me get involved in his projects, which would have been harder to do otherwise. I feel like I have developed faster as a researcher since doing MATS.
Johannes completed the MATS Summer 2022 Cohort under the mentorship of Evan Hubinger (then a Research Fellow at MIRI). As a result of MATS, Johannes co-authored the paper Conditioning Predictive Models: Risks and Strategies with Evan as a lead author. He also published a follow-up paper on Incentivizing honest performative predictions with proper scoring rules at the UAI 2023 conference. After MATS, Johannes started a PhD in Computer Science at CHAI. Since 2024, he Johannes has been working at Anthropic on alignment stress-testing.

I endorse MATS strongly! MATS is my top recommendation for people looking to get into technical AI Safety research. The mentorship and community I received through MATS enabled me to quickly grow as a researcher and gave me the space to pursue useful research directions.
Cody Rushing is an Undergraduate CS Major at UT Austin. He is currently working with Buck Shlegeris and Redwood Research on AI Control. He is continuing this research into the fall.
https://starship006.github.io/

MATS was an excellent environment to get productive work done and a fantastic resource to improve my future impact in AI alignment. I made connections, learned a great deal about my mentor's subfield and alignment in general, and was fired up to keep working when I got back to Australia. Since MATS I've been funded for a project with a collaborator I met at MATS, and gotten significantly further in the hiring process for orgs than before.
Previous UK AISI employee experienced in frontier LLM evaluation, now looking to contribute to technical AI safety and reducing extinction risks from misaligned AGI systems.


Ethan spent a lot of time discussing our research with us and gave great advice on direction. He unblocked us in various ways, such as getting access to more models or to lots of compute budget. He connected us with lots of great people, some of whom became collaborators. And he was a very inspiring mentor to work with.
Dan Valentine is a Member of Technical Staff at Anthropic, an AI safety and research company. His work is primarily focused on AI safety and alignment research, including scalable oversight methods and understanding how AI models interact with data and prompts.
Since late 2021, over 446 researchers have trained through MATS, producing 150+ research papers, joining leading AI labs, and founding new organizations driving progress in AI alignment, transparency, and security.
In the past 4 years, we have helped produce more than 150 research publications with over 7,500 collective citations; our organizational h-index is 38.
MATS fellows have helped develop new research agendas, including sparse auto-encoders for AI interpretability, activation/representation engineering, emergent misalignment, inoculation prompting, developmental interpretability, computational mechanics, glitch token analysis, evaluating situational awareness, gradient routing, externalized reasoning oversight, conditioning predictive models, formalizing natural abstractions, and more!
10% of alumni have co-founded AI safety organizations or research teams during or after MATS.
MATS alumni-founded organizations include Aether, Apollo Research, ARENA, Athena, Atla AI, Cadenza Labs, Catalyze Impact, Center for AI Policy, Contramont Research, Coordinal Research, Decode Research, Freestyle Research, Fulcrum, Groundless, Leap Labs, LISA, Luthien Research, Poseidon Research, PRISM Eval, Simplex, Timaeus, Theorem Labs, Watertight AI, and Workshop Labs.
80% of alumni are now working in AI alignment, transparency, and security.
MATS alumni have been hired by leading organizations like Anthropic, Google DeepMind, OpenAI, Meta AI, UK AISI, Redwood Research, METR, RAND CAST, Coefficient Giving, ARC, FAR.AI, Apollo Research, Truthful AI, Goodfire, LawZero, MIRI, CAIF, Center on Long-Term Risk, Beneficial AI Foundation, SaferAI, Haize Labs, EleutherAI, Harmony Intelligence, Conjecture, and joined academic research groups like UC Berkeley CHAI, NYU ARG, NU Bau Lab, Mila, and MIT Tegmark Group.
MATS provides mentorship, research funding, housing, and community so researchers can devote their energy to solving the world’s most important problem.
Fellows receive guidance from top researchers in AI alignment, governance, and security.
Fellows work with a dedicated research manager who helps scope projects, maintain progress, and remove blockers.
Fellows participate in seminars, workshops, and guest lectures led by experts across the alignment community.
Fellows receive a $15k stipend from AI Safety Support to cover living expenses.
Fellows are provided with $12k of compute resources to support experiments and evaluations.
Fellows have access to office space in Berkeley and London, and collaborate daily with fellow researchers.
Fellows receive catered lunches and dinners and are provided with private housing for the full duration of the program.
Fellows gain connections and networking opportunities across the broader AI alignment ecosystem.
Fellows may be invited to join the London-based extension program for an additional 6–12 months of research.
The body of research produced by MATS fellows spans the full spectrum of advancing AI safety, resilience, and understanding. Scholars investigate the inner workings of modern AI systems through mechanistic interpretability, sparse feature analysis, studies of latent representations and other techniques.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding. It asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.
Authors:
Fellow: Daniel Tan
Jan Betley, Daniel Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
Date:
Feb 24, 2025
Sparse Autoencoders Find Highly Interpretable Features in Language Models
One of the roadblocks to a better understanding of neural networks' internals is polysemanticity, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons. Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. These autoencoders learn sets of sparsely activating features that are more interpretable and monosemantic than directions identified by alternative approaches, where interpretability is measured by automated methods. Moreover, we show that with our learned set of features, we can pinpoint the features that are causally responsible for counterfactual behaviour on the indirect object identification task \citep{wang2022interpretability} to a finer degree than previous decompositions. This work indicates that it is possible to resolve superposition in language models using a scalable, unsupervised method. Our method may serve as a foundation for future mechanistic interpretability work, which we hope will enable greater model transparency and steerability.
Authors:
Fellow: Hoagy Cunningham
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey
Date:
Sep 15, 2023
AI agents find $4.6M in blockchain smart contract exploits
AI models are increasingly good at cyber tasks, as we've written about before. But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents' ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench)—a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoff (March 2025), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth $4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.
Authors:
Fellow: Winnie Xiao
Winnie Xiao, Cole Killian, Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng
Date:
Dec 1, 2025
Hands-on research using machine learning experiments to understand and improve model safety — including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness.
Research on policy frameworks and strategic dynamics — including international coordination, institutional design, AI forecasting, and regulatory proposals.
Foundational research on the mathematical and philosophical principles underlying agency, alignment, and safe reasoning in advanced AI systems.
Research translating governance goals into technical mechanisms — including compliance protocols, evaluation standards, and enforcement tools.
Research on hardware security and infrastructure-level mechanisms for monitoring and securing AI development and deployment — including side-channel analysis, cluster security, and physical-layer verification.
MATS aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence (AI). We believe that ambitious researchers from a variety of backgrounds have the potential to meaningfully contribute to the field of alignment research. We aim to provide the training, logistics, and community necessary to aid this transition. We also connect our fellows with financial support to ensure their stability and security.
MATS Research is an independent 501(c)(3) public charity (EIN: 99-0648563).
.webp)
MATS is an independent research and educational seminar program that connects talented researchers with top mentors in the fields of AI alignment, interpretability, and governance. The main goal of MATS is to help scholars develop as AI alignment researchers.
The MATS Program is a 12-week research fellowship designed to train and support emerging researchers working on AI alignment, interpretability, governance, and safety. Fellows collaborate with world-class mentors, receive dedicated research management support, and join a vibrant community in Berkeley focused on advancing safe and reliable AI. The program provides the structure, resources, and mentorship needed to produce impactful research and launch long-term careers in AI safety.
MATS mentors are leading researchers from a broad range of AI safety, alignment, governance, interpretability, and security domains. They include academics, industry researchers, and independent experts who guide scholars through research projects, provide feedback, and help shape each scholar’s growth as a researcher. The mentors represent expertise in areas such as:
Key dates
Application:
The main program will then run from early June to late August, with the extension phase for accepted fellows beginning in September.
MATS accepts applicants from diverse academic and professional backgrounds ranging from machine learning, mathematics, and computer science to policy, economics, physics, and cognitive science. The primary requirements are strong motivation to contribute to AI safety and evidence of technical aptitude or research potential. Prior AI safety experience is helpful but not required.
Applicants submit a general application, applying to various tracks (technical governance, empirical, policy & strategy, theory, and compute governance) and streams within those tracks.
After a centralized review period, applicants who are advanced will then undergo additional evaluations depending on the preferences of the streams they've applied to before doing final interviews and receiving offers.
For more information on how to get into MATS, please look at this page.