MATS Alumni

Alumni Spotlight

Nina Panickssery

Nina participated in the MATS summer 2023 cohort under the mentorship of Evan Hubinger. As a result of MATS, she published the paper Steering Llama 2 via Contrastive Activation Addition which won an Outstanding Paper Award at ACL 2024. After MATS, Nina joined Anthropic as a research scientist, and has mentored a number of SPAR cohorts working on LLM alignment projects.
Jesse Hoogland

Jesse Hoogland is the executive director of Timaeus, an AI safety research organization studying developmental interpretability and singular learning theory. He was a MATS scholar during MATS 3.0 and 3.1 in Evan Hubinger’s Deceptive AI stream. During this period, he became interested in understanding how AI systems develop during training. This led to him helping to organize the SLT and Alignment conference and the DevInterp conference, which resulted in the developmental interpretability research agenda.
Asa Cooper Stickland

Asa took part in the MATS Winter 2022 cohort (with Owain Evans), which lead to the publication of (in collaboration with other MATS participants) papers on measuring situational awareness, and the “reversal curse” in large language models . Asa participated in MATS while in the final stages of a machine learning/NLP PhD from the University of Edinburgh, which he has since completed. He was formerly a postdoc in the Alignment Research Group at New York University and now works as a research scientist at the UK AI Safety Institute.
Jessica Rumbelow

During the MATS Autumn 2022 Cohort, Jessica published “SolidGoldMagikarp (plus, prompt generation)” with fellow scholar Matthew Watkins, which was featured in Vice News, and tied first place with fellow scholar Dane Sherburn at the Safe and Trustworthy AI Workshop 2022 for her poster, “Hierarchical Perturbation.” Before MATS, Jessica worked as a Research Scientist at the University of St Andrews applying deep learning to digital pathology, while completing her PhD in model agnostic interpretability. She is now the CEO and research director of Leap Laboratories, a VC-funded deep tech startup developing AI interpretability for knowledge discovery.
Lisa Thiergart

Lisa research leads the technical governance team, which she founded in February 2024 after joining MIRI as research manager in summer 2023. At MIRI Lisa focuses on concrete proposals for AI regulation, AI security and Verification Mechanisms. She is a Foresight Fellow in Intelligent Cooperation, mentors for MATS, and advises on Foresight AI Safety grants. Before MIRI, Lisa was an empirical researcher at MATS mentored by Alex Turner, during which they pioneered the technique of activation steering. Her education includes a BSc in Computer Science from TUM, graduate studies in Machine Learning and Reinforcement Learning for robotics at the Georgia Institute of Technology, and some coursework in medicine and neuroscience. Lisa also holds an honors degree in Technology Management and Entrepreneurship. Her current research focus includes AI Security, Verification mechanisms for coordination, and an ongoing (side) interest in whole brain emulation.
Marius Hobbhahn

Marius took part in MATS Winter 2022/23 Cohort under the mentorship of Evan Hubinger (Anthropic). He published multiple pieces on mechanistic interpretability on LessWrong including work on maximum data dimension and double descent. He is currently the CEO and Director or Apollo Research, a new London-based technical alignment organization. Previously, he did a Ph.D. in Machine Learning and conducted independent alignment research. Read more on his website.
Dane Sherburn

During the MATS Autumn 2022 Cohort, while mentored by William Saunders (Research Engineer, OpenAI), Dane tied first place with fellow scholar Jessica Rumbelow at the Safe and Trustworthy AI Workshop 2022 for his poster, “Bootstrapping models to critique chain-of-thought rationales for n-digit arithmetic problems.” Before MATS, Dane completed a MEng (Maths & CS) at Imperial College London, and Redwood Research’s ML for Alignment Bootcamp. Dane now works at OpenAI.
John Hughes

John Hughes is a researcher contracting with Anthropic and working as a machine learning engineer at Speechmatics (Cambridge, UK). He was mentored by Ethan Perez since Summer 2023 when he took part in MATS4. During this time at MATS, he significantly contributed to an AI safety via debate paper, which was awarded best paper at ICML 2024. Since finishing work on scalable oversight, he has continued to work with Ethan on adversarial robustness via classifier-based defenses and is now leading a project on red-teaming audio-based LLMs such as Gemini and the new GPT-4o advanced voice mode.
Joseph Bloom

Joseph Bloom is an AI safety researcher specializing in Mechanistic Interpretability. He cofounded Decode Research, which builds tools for MI (see here and here) and recently assisted Google DeepMind on Gemma Scope. Joseph's work includes maintaining TransformerLens, writing the DecisionTransformerInterpretability library, and publishing "A is for Absorption" with LASR Labs. He participated in the MATS 6.0 cohort under Neel Nanda's mentorship and holds degrees in Computational Biology and Statistics. Joseph continues to focus on developing tools and methodologies to enhance the interpretability and safety of AI systems.
Johannes Treutlein

Johannes completed the MATS Summer 2022 Cohort under the mentorship of Evan Hubinger (then a Research Fellow at MIRI). As a result of MATS, Johannes co-authored the paper Conditioning Predictive Models: Risks and Strategies with Evan as a lead author. He also published a follow-up paper on Incentivizing honest performative predictions with proper scoring rules at the UAI 2023 conference. After MATS, Johannes started a PhD in Computer Science at CHAI. In 2024, he joined Anthropic to work on alignment stress-testing as part of Evan Hubinger’s team.
Sam Marks

Sam Marks is a postdoctoral interpretability researcher in David Bau's lab at Northeastern University. He has also served as the Director of Technical Programs at HAIST, co-organized the 2023 MIT Mechanistic Interpretability Conference, and directed the Alignment Workshop for AI Researchers. Sam participated in MATS in Summer 2022 under the mentorship of Stuart Armstrong.
Aryan Bhatt

During MATS 3.0, Aryan worked on LLM Agent evals under the mentorship of Beth Barnes at METR. He now works at Redwood Research on their AI Control agenda, developing control techniques to improve safety despite intentional subversion. Most recently, he's been working with his MATS 6.0 cohort on extending AI control methodology to a Linux Agent setting.
Thomas Larsen

Thomas took part in the Summer 2022 Cohort with John Wentworth and the Winter 2023 Cohort with Nate Soares. During this time, he wrote a detailed overview of AI Safety approaches. He continued his SERI MATS work at MIRI, before leaving to found the Center for AI Policy, an AI safety advocacy organization. He is currently a researcher at the AI Futures Project and a guest fund manager at the LTFF.
Teun van der Weij

Teun took part in the MATS Winter 2024 cohort (plus extension) with Francis Rhys Ward, where he worked mostly on sandbagging. He and his collaborators got additional funding from the AI Safety Fund to continue researching sandbagging. Currently, he is working at an AI-powered forecasting start up. Before MATS, he co-founded the European Network for AI Safety.
Megan Kinniment

Megan currently works on the evaluations project at the Alignment Research Center. She’s interested in model evaluation, testing, and tooling. Megan previously worked on independent projects (ML interpretability amongst others) with a grant from the Center for Longterm Risk.
Alan Cooney

Alan Cooney leads the Autonomous Systems team within the UK's AI Safety Institute. His team is responsible for assessing the capabilities and risks of Frontier AI systems released by AI labs such as OpenAI, Google and Anthropic. Prior to working in AI safety, he was an investment consultant and start-up founder, with his company Skyhook being acquired in 2023. He was mentored by Google DeepMind researcher Neel Nanda in the MATS Autumn 2022 cohort.
Wes Gurnee

Wes participated in the MATS 3.0 cohort under the supervision of Neel Nanda where he worked on methods for extracting concepts out of superposition. After MATS, he finished his PhD at MIT working on a broad range of interpretability topics with Dimitris Bertsimas, Max Tegmark, and MATS scholars from subsequent cohorts. After grad school, he joined the interpretability team at Anthropic where he currently works on methods to extract circuits from frontier LLMs as well as researching various aspects of model "biology."

Highlights

Placements: Alumni have been hired by leading organizations like Anthropic, Google DeepMind, OpenAI, Meta AI, UK AISI, Redwood Research, METR, RAND TASP, Open Philanthropy, ARC, FAR.AI, Apollo Research, Truthful AI, LawZero, MIRI, CAIF, CLR, Beneficial AI Foundation, SaferAI, Haize Labs, Eleuther AI, Harmony Intelligence, Conjecture, Magic, and the US government, and joined academic research groups like UC Berkeley CHAI, NYU ARG, KASL, MILA, and MIT Tegmark Group.
Founders: Alumni have co-founded AI safety organizations, including Aether, Apollo Research, Athena, Atla, Cadenza Labs, Catalyze Impact, Center for AI Policy, Contramont Research, Coordinal Research, Freestyle Research, Leap Labs, LISA, Live Theory, Luthien Research, Poseidon Research, PRISM Eval, Simplex, Timaeus, Theorem Labs, and Workshop Labs and joined start-up accelerator programs like Y Combinator, Entrepreneur First, Catalyze Impact, and Seldon Lab.
Funding: Alumni have pursued independent research with funding from the Open Philanthropy, Long-Term Future Fund, Survival and Flourishing Fund, Cooperative AI Foundation, Foresight Institute, Manifund, Lightspeed Grants, and more.
Research: MATS has helped produce 116 arXiv publications, with over 5000 citations; our h-index is 31. Scholars have helped develop new AI alignment agendas, including activation engineering, externalized reasoning oversight, conditioning predictive models, developmental interpretability, evaluating situational awareness, formalizing natural abstractions, and gradient routing. Select papers featuring scholars’ research include:

View MATS research here!

Testimonials

"Apollo almost certainly would not have happened without MATS. One of the core reasons why starting an organization is hard is because the founding members need to know and trust each other. It is often hard to find people with similar agendas that you also personally enjoy working with in a systematic manner. MATS implicitly created such an environment because it enabled many of us to understand what everyone else is working on, get to know them personally and see their research progress without having to commit to anything in particular."

Marius Hobbhahn
Co-Founder, Apollo Research
Winter 2022-23 Cohort
Mentored by Evan Hubinger
"The MATS scholarship enabled me to focus on data-independent interpretability for several months, which forms the backbone of Leap's research agenda. The work Matthew and I did at MATS was received well, and endorsed by Eliezer Yudkowsky. This research would not have happened without MATS bringing Matthew and I together. I suspect that this work and Eliezer's endorsement was an important factor in Leap securing funding from LTFF, and instrumental for our OpenPhil grant. I am incredibly grateful to the MATS leadership and community for the many opportunities they have provided, without which Leap would probably not exist."

Jessica Rumbelow
Co-Founder, Leap Labs
Summer 2022 Cohort
Independent Mentorship
"Working in a team environment, particularly one as stimulating as MATS, was a transformative experience. It not only refined my research skills but also instilled a newfound entrepreneurial spirit in me. The program encouraged me to think beyond the conventional, to innovate, and to take risks. Additionally, the array of skills I acquired during my time at MATS was vast. I delved deep into research engineering, honed my science communication abilities, and even tapped into the art of fundraising. These skills, I believe, are indispensable and have equipped me to navigate the ever-evolving world of research with confidence. In conclusion, I wholeheartedly endorse the MATS program. To anyone considering embarking on this journey, you are not only signing up for an unparalleled research experience but also a lifetime of growth, learning, and camaraderie."

Kay Kozaronek
Co-Executive Director, Catalyze Impact
Winter 2022-23 Cohort
Mentored by John Wentworth
"MATS helped me get deeper into AI safety research by motivating me to get up to speed with current research and giving me access to mentorship from an expert in AI safety, as well as a smart and talented cohort and a large network of researchers. It also provided infrastructure such as office space in Berkeley and a generous stipend. SERI MATS worked as a matchmaker between Evan Hubinger and me and thus helped me get involved in his projects, which would have been harder to do otherwise. I feel like I have developed faster as a researcher since doing MATS."

Johannes Treutlein
PhD Student, UC Berkeley
Summer 2022 Cohort
Mentored by Evan Hubinger
"MATS was an excellent environment to get productive work done and a fantastic resource to improve my future impact in AI alignment. I made connections, learned a great deal about my mentor's subfield and alignment in general, and was fired up to keep working when I got back to Australia. Since MATS I've been funded for a project with a collaborator I met at MATS, and gotten significantly further in the hiring process for orgs than before."

Jay Bailey
Independent AI Safety Researcher
Summer 2022 Cohort
Mentored by Owain Evans
"Ethan spent a lot of time discussing our research with us and gave great advice on direction. He unblocked us in various ways, such as getting access to more models or to lots of compute budget. He connected us with lots of great people, some of whom became collaborators. And he was a very inspiring mentor to work with."

Dan Valentine
Independent AI Safety Researcher
Summer 2023 Cohort
Mentored by Ethan Perez
"MATS was a life changing experience. I met and got mentored by amazing people, and I learned so much in such a small amount of time. Looking back at me before this program, I don't think I could even recognize myself 8 month ago. Even though I have no academic background, I felt listened, empowered and supported in order to tackle the biggest challenges that I (and possibly we) have ever faced."

Quentin Feuillade--Montixi
Model Evaluator, ARC Evals
Winter 2022-23 Cohort
Mentored by Stuart Armstrong, Rebecca Gorman, and Nicholas Kees Dupuis
"There's life pre-MATS and life post-MATS. It was the inflection point that set me up to become a technical AI safety researcher. I don't think there are other opportunities as good at getting early-career people integrated into AI safety. The in-person program was the most impactful and high-energy two months I've ever been a part of, and it's my number one recommendation to people considering work on AI safety."

Jesse Hoogland
Executive Director, Timaeus
Winter 2022-23 Cohort
Mentored by Evan Hubinger

Select Summer 2024 Alumni

Aashiq Muhamed

Aashiq participated in the MATS Summer 2024 Cohort under the mentorship of Lucius Bushnaq and Jake Mendel (Apollo Research). His research on "Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs" was published at Attrib NeurIPS 2024. The work explores effective strategies for learning subdomain tail features in modern sparse autoencoders (SAEs).

Aashiq holds a Master's degree from Stanford University and a Bachelor's degree from IIT Roorkee, where he was awarded the President's Gold Medal. Before transitioning to AI safety, he spent five years as an Applied Scientist at Amazon.

He is currently pursuing a Ph.D. in machine learning and alignment at Carnegie Mellon University. Learn more about his work at https://aashiqmuhamed.github.io/.
Adam Karvonen

Adam took part in the Summer 2024 cohort in Neel Nanda's stream along with fellow scholar Can Rager. He worked on evaluations for sparse autoencoders and will release SAE Bench, a suite of evaluations for SAEs, during the MATS 6.1 extension. Prior to MATS, he did independent interpretability research, including the ChessGPT research paper. Read more on his website.
Anish Mudide

Anish participated in the MATS Summer 2024 Cohort under the mentorship of Christian Schroeder de Witt (University of Oxford). He published "Efficient Dictionary Learning with Switch Sparse Autoencoders" (https://arxiv.org/abs/2410.08201), a novel SAE architecture aimed at reducing the compute cost of training SAEs. He is currently pursuing an undergraduate degree in Computer Science at MIT, where he is grateful to be advised by Max Tegmark.
Can Rager

Can is currently developing SAE Bench, an evaluation suite for interpretability techniques, in collaboration with Neel Nanda, Adam Karvonen and Decode Research. They launched the project during the MATS 2024 summer cohort. Can, a physics graduate, transitioned to language model interpretability through ARENA and AI Safety Camp. His previous work on Sparse Feature Circuits and Edge Attribution Patching focused on circuit discovery in language models.
Cody Rushing

Cody Rushing is an Undergraduate CS Major at UT Austin. He is currently working with Buck Shlegeris and Redwood Research on AI Control. He is continuing this research into the fall.

https://starship006.github.io/
Constantin Weisser

As part of MATS 6.0 supervised by CHAI’s Micah Carroll, Constantin and his collaborators demonstrated that targeted manipulation and deception emerge in LLMs trained on user rather than annotator feedback. His MATS stream’s paper was accepted as an oral contribution at the SATA workshop and a spotlight at the SoLaR workshop, both at NeurIPS.
Constantin's experience at MATS led to his role as the first technical staff member at Haize Labs. There, he focuses on LLM automated red teaming and jailbreak protection supporting labs such as Anthropic, OpenAI, and AI21.
Prior to MATS, Constantin completed a PhD applying machine learning to particle physics and worked as a machine learning consultant at McKinsey.
David Abecassis

David participated in the Summer 2024 MATS cohort under the mentorship of Lisa Thiergart of MIRI's technical governance team, during which he developed forecasts of US-China relations in the run-up to transformative AI. His interests include international relations, history, wargames, and systems design. David has a BS in Symbolic Systems from Stanford University and a professional background in video game design.
Evžen Wybitul

Evžen took part in the winter and summer MATS cohorts in 2024. He is interested in model internals work, ranging from mechanistic interpretability to approaches on a higher level of abstraction. With his colleagues and under the supervision of Alex Turner, he worked on Gradient routing (https://arxiv.org/abs/2410.04332). He is now writing his master thesis at ETH Zurich under the supervision of Florian Tramèr.
Guang-Yuan Hao

Guang-Yuan participated in the MATS 6.0 Cohort in summer 2024, collaborating with Steven Basart and Andy Zou at CAIS. His AI safety research focuses on robustness and interpretability, with the goal of enhancing the reliability and transparency of AI systems across diverse environments and modalities.
Hantao Lou

Hantao worked on developmental interpretability and model auditing under the mentorship of Evan Hubinger during the 2024 MATS summer cohort remotely. He analyzed the shift of interpretable features before/after finetuning, thereby developing a method to improve model auditing. Before MATS, Hantao is an undergraduate student at Peking University and a visiting student at PAIR Lab. He is now continuing his research in MATS extension to gain a deeper insight into interpretability and alignment.
Jiaxin Wen

Jiaxin was a MATS 6.0 scholar under the mentorship of Akbir Khan, Ethan Perez, and Buck Shlegeris. During MATS, he studied unintended AI deception that emerges from RLHF and low-stakes AI Control over scheming AI. He is currently a Master's student at Tsinghua University.
Kai Fronsdal

As part of MATS 6.0, Kai worked on measuring instrumental self-reasoning (a precursor to deceptive alignment) in frontier models under the mentorship of David Lindner. Prior to MATS, he explored variants of Goodhart's Law and evaluating llm's ability to self-correct. As of October 2024, he is finishing a masters at Stanford University studying math and computer science while completing the MATS extension phase.
Lucie Philippon

Lucie a Program Manager at FPTPEC (French government, PECC), where she organizes track 2 dialogues on the international governance of AI. She previously worked in software engineering and cryptocurrency, and she is considering switching to a more technical role in 2025. During MATS, she worked with Philp Tomei and Matija Franklin on exploring the interplay between macroeconomic forces and frontier AI labs, specifically how the investors of frontier labs influence the lab's behavior.
Marcus Williams

Marcus took part in Micah Carroll's MATS summer 2024 stream where he worked on exploring annotator vulnerabilities and the tendency of LLMs to influence human preferences. This resulted in the paper "Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback". Previously Marcus was doing independent alignment research on preference modeling and RL.
Matthew Farr

Matthew Farr participated in the MATS 2024 Summer Cohort, under the mentorship of Sahil K (independent, ex-MIRI) and Sophie Libkind (Topos Institute). He is currently developing his MATS research into the MoSSAIC branch of the High Actuation Spaces Project. His main focus is clarifying substrate-flexible risk and its implications for interpretability.
Michael Pearce

Michael worked on mechanistic interpretability with Lee Sharkey during MATS 6.0. During MATS, he developed methods to decompile more interpretable transformer architectures, used Meta-SAEs to find interpretable features within SAE latents, and applied information-theoretic approaches like minimal description length to guide the selection of SAE hyperparameters and architecture. He has competed a PhD in theoretical physics at Stanford focused on understanding the dynamics of evolving populations and complex ecosystems.
Paul "Lorxus" Rapoport

Paul took part in the Summer 2024 cohort under the mentorship of Tsvi Benson-Tilsen. He holds a PhD in math, specializing in geometric group theory, and before MATS, had finished a postdoc position at Temple University. During MATS, he worked on developing natural latent theory with both computational tests and category theory.
Rauno Arike

Rauno (https://www.linkedin.com/in/rauno-arike/) worked in Marius Hobbhahn's stream, collaborating with Elizabeth Donoway on goal-directedness evaluations for LLMs. Before MATS, he studied Computer Science at TU Delft and worked as a software engineer. After MATS, he intends to continue doing alignment research on the side of his Artificial Intelligence MSc at the University of Amsterdam.
Rupal Jain

Rupal Jain is a Computer Science PhD student and UNESCO Poland Research Fellow. She's an AI Governance scholar at MATS 6.0 working on AI economy and developing standardized measures to price in AI risk and enhance market efficiency with mentors Matija Franklin and Philip Moreira Tomei. Concurrently, she’s also a Carl Menger Doctoral Research Fellow at the Mercatus Research Center, George Mason University where she learns about the political economy and socioeconomic sensitivity of advanced AI. Prior to that, she worked at Alphabet Inc. as an engineer. Connect with her at: https://www.linkedin.com/in/rupal-jain-677301125/
Rupali Bhati

Rupali is a PhD student at Northeastern University. Her work is focused on cooperation using multi agent reinforcement learning. Rupali was a scholar at the MATS 6.0 cohort where she worked with Dr. Christian Schroder de Witt in the Cooperative AI stream.
Satvik Golechha

In the Summer 2024 cohort, Satvik worked on neural network modularity in the interpretability stream of Nandi Schoots. Satvik is an independent AI safety researcher interested in neural network interpretability, alignment/safety, unsupervised learning, and deep learning theory. In the past, he has worked on fundamental AI research at Microsoft Research and applied ML research for healthcare at Wadhwani AI. On the side, he enjoys writing fiction and poetry. For more, here's his website: https://7vik.io.
Sunishchal Dev

Dev participated in the MATS Summer 2024 Cohort under the mentorship of Marius Hobbhahn (CEO of Apollo Research), where he published a LessWrong post on evaluating the quality of benchmark datasets, with a focus on model-written safety evals. He previously worked as a data scientist doing supply chain optimization for large enterprises and got into AI safety by contributing to the METR Task Bounty program. After MATS, Dev served as a TA for ARENA and joined the RAND TASP Fellowship, focusing on dangerous AI capability evaluations.