MATS Alumni
Alumni Spotlight
Highlights
Alumni have been hired by leading organizations like Anthropic, Google DeepMind, OpenAI, UK AISI, METR, RAND TASP, Open Philanthropy, ARC, MIRI, CAIF, CLR, Haize Labs, Conjecture, Magic, and the US government, and joined academic research groups like UC Berkeley CHAI, NYU ARG, Cambridge KASL, and MIT Tegmark Group.
Alumni have co-founded AI safety organizations, including Apollo Research, Athena, Atla, Cadenza Labs, Catalyze Impact, Contramont Research, Center for AI Policy, Leap Labs, PRISM Eval, Simplex, and Timaeus, and joined start-up accelerator programs like Y Combinator, Entrepeneur First, and Catalyze Impact.
Alumni have pursued independent research with funding from the Long-Term Future Fund, Open Philanthropy, Survival and Flourishing Fund, Lightspeed Grants, Manifund, and Foresight Institute.
Scholars have helped develop new AI alignment agendas, including activation engineering, externalized reasoning oversight, conditioning predictive models, developmental interpretability, evaluating situational awareness, formalizing natural abstractions, and gradient routing.
Recent papers featuring scholars’ research include:
“Copy Suppression: Comprehensively Understanding an Attention Head”;
“How to Catch an AI Liar: Detection in Black-box LLMs by Asking Unrelated Questions”;
"Sparse Autoencoders Find Highly Interpretable Features in Language Models";
"Evaluating Language-Model Agents on Realistic Autonomous Tasks";
"Representation Engineering: A Top-Down Approach to AI Transparency";
"Taken out of context: On measuring situational awareness in LLMs".
Testimonials
Select Summer 2024 Alumni
-
Aashiq Muhamed
Aashiq participated in the MATS Summer 2024 Cohort under the mentorship of Lucius Bushnaq and Jake Mendel (Apollo Research). His research on "Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs" was published at Attrib NeurIPS 2024. The work explores effective strategies for learning subdomain tail features in modern sparse autoencoders (SAEs).
Aashiq holds a Master's degree from Stanford University and a Bachelor's degree from IIT Roorkee, where he was awarded the President's Gold Medal. Before transitioning to AI safety, he spent five years as an Applied Scientist at Amazon.
He is currently pursuing a Ph.D. in machine learning and alignment at Carnegie Mellon University. Learn more about his work at https://aashiqmuhamed.github.io/. -
Adam Karvonen
Adam took part in the Summer 2024 cohort in Neel Nanda's stream along with fellow scholar Can Rager. He worked on evaluations for sparse autoencoders and will release SAE Bench, a suite of evaluations for SAEs, during the MATS 6.1 extension. Prior to MATS, he did independent interpretability research, including the ChessGPT research paper. Read more on his website.
-
Anish Mudide
Anish participated in the MATS Summer 2024 Cohort under the mentorship of Christian Schroeder de Witt (University of Oxford). He published "Efficient Dictionary Learning with Switch Sparse Autoencoders" (https://arxiv.org/abs/2410.08201), a novel SAE architecture aimed at reducing the compute cost of training SAEs. He is currently pursuing an undergraduate degree in Computer Science at MIT, where he is grateful to be advised by Max Tegmark.
-
Can Rager
Can is currently developing SAE Bench, an evaluation suite for interpretability techniques, in collaboration with Neel Nanda, Adam Karvonen and Decode Research. They launched the project during the MATS 2024 summer cohort. Can, a physics graduate, transitioned to language model interpretability through ARENA and AI Safety Camp. His previous work on Sparse Feature Circuits and Edge Attribution Patching focused on circuit discovery in language models.
-
Cody Rushing
Cody Rushing is an Undergraduate CS Major at UT Austin. He is currently working with Buck Shlegeris and Redwood Research on AI Control. He is continuing this research into the fall.
https://starship006.github.io/ -
Constantin Weisser
As part of MATS 6.0 supervised by CHAI’s Micah Carroll, Constantin and his collaborators demonstrated that targeted manipulation and deception emerge in LLMs trained on user rather than annotator feedback. His MATS stream’s paper was accepted as an oral contribution at the SATA workshop and a spotlight at the SoLaR workshop, both at NeurIPS.
Constantin's experience at MATS led to his role as the first technical staff member at Haize Labs. There, he focuses on LLM automated red teaming and jailbreak protection supporting labs such as Anthropic, OpenAI, and AI21.
Prior to MATS, Constantin completed a PhD applying machine learning to particle physics and worked as a machine learning consultant at McKinsey.
-
David Abecassis
David participated in the Summer 2024 MATS cohort under the mentorship of Lisa Thiergart of MIRI's technical governance team, during which he developed forecasts of US-China relations in the run-up to transformative AI. His interests include international relations, history, wargames, and systems design. David has a BS in Symbolic Systems from Stanford University and a professional background in video game design.
-
Evžen Wybitul
Evžen took part in the winter and summer MATS cohorts in 2024. He is interested in model internals work, ranging from mechanistic interpretability to approaches on a higher level of abstraction. With his colleagues and under the supervision of Alex Turner, he worked on Gradient routing (https://arxiv.org/abs/2410.04332). He is now writing his master thesis at ETH Zurich under the supervision of Florian Tramèr.
-
Guang-Yuan Hao
Guang-Yuan participated in the MATS 6.0 Cohort in summer 2024, collaborating with Steven Basart and Andy Zou at CAIS. His AI safety research focuses on robustness and interpretability, with the goal of enhancing the reliability and transparency of AI systems across diverse environments and modalities.
-
Hantao Lou
Hantao worked on developmental interpretability and model auditing under the mentorship of Evan Hubinger during the 2024 MATS summer cohort remotely. He analyzed the shift of interpretable features before/after finetuning, thereby developing a method to improve model auditing. Before MATS, Hantao is an undergraduate student at Peking University and a visiting student at PAIR Lab. He is now continuing his research in MATS extension to gain a deeper insight into interpretability and alignment.
-
Jiaxin Wen
Jiaxin was a MATS 6.0 scholar under the mentorship of Akbir Khan, Ethan Perez, and Buck Shlegeris. During MATS, he studied unintended AI deception that emerges from RLHF and low-stakes AI Control over scheming AI. He is currently a Master's student at Tsinghua University.
-
Kai Fronsdal
As part of MATS 6.0, Kai worked on measuring instrumental self-reasoning (a precursor to deceptive alignment) in frontier models under the mentorship of David Lindner. Prior to MATS, he explored variants of Goodhart's Law and evaluating llm's ability to self-correct. As of October 2024, he is finishing a masters at Stanford University studying math and computer science while completing the MATS extension phase.
-
Lucie Philippon
Lucie a Program Manager at FPTPEC (French government, PECC), where she organizes track 2 dialogues on the international governance of AI. She previously worked in software engineering and cryptocurrency, and she is considering switching to a more technical role in 2025. During MATS, she worked with Philp Tomei and Matija Franklin on exploring the interplay between macroeconomic forces and frontier AI labs, specifically how the investors of frontier labs influence the lab's behavior.
-
Marcus Williams
Marcus took part in Micah Carroll's MATS summer 2024 stream where he worked on exploring annotator vulnerabilities and the tendency of LLMs to influence human preferences. This resulted in the paper "Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback". Previously Marcus was doing independent alignment research on preference modeling and RL.
-
Matthew Farr
Matthew Farr participated in the MATS 2024 Summer Cohort, under the mentorship of Sahil K (independent, ex-MIRI) and Sophie Libkind (Topos Institute). He is currently developing his MATS research into the MoSSAIC branch of the High Actuation Spaces Project. His main focus is clarifying substrate-flexible risk and its implications for interpretability.
-
Michael Pearce
Michael worked on mechanistic interpretability with Lee Sharkey during MATS 6.0. During MATS, he developed methods to decompile more interpretable transformer architectures, used Meta-SAEs to find interpretable features within SAE latents, and applied information-theoretic approaches like minimal description length to guide the selection of SAE hyperparameters and architecture. He has competed a PhD in theoretical physics at Stanford focused on understanding the dynamics of evolving populations and complex ecosystems.
-
Paul "Lorxus" Rapoport
Paul took part in the Summer 2024 cohort under the mentorship of Tsvi Benson-Tilsen. He holds a PhD in math, specializing in geometric group theory, and before MATS, had finished a postdoc position at Temple University. During MATS, he worked on developing natural latent theory with both computational tests and category theory.
-
Rauno Arike
Rauno (https://www.linkedin.com/in/rauno-arike/) worked in Marius Hobbhahn's stream, collaborating with Elizabeth Donoway on goal-directedness evaluations for LLMs. Before MATS, he studied Computer Science at TU Delft and worked as a software engineer. After MATS, he intends to continue doing alignment research on the side of his Artificial Intelligence MSc at the University of Amsterdam.
-
Rupal Jain
Rupal Jain is a Computer Science PhD student and UNESCO Poland Research Fellow. She's an AI Governance scholar at MATS 6.0 working on AI economy and developing standardized measures to price in AI risk and enhance market efficiency with mentors Matija Franklin and Philip Moreira Tomei. Concurrently, she’s also a Carl Menger Doctoral Research Fellow at the Mercatus Research Center, George Mason University where she learns about the political economy and socioeconomic sensitivity of advanced AI. Prior to that, she worked at Alphabet Inc. as an engineer. Connect with her at: https://www.linkedin.com/in/rupal-jain-677301125/
-
Rupali Bhati
Rupali is a PhD student at Northeastern University. Her work is focused on cooperation using multi agent reinforcement learning. Rupali was a scholar at the MATS 6.0 cohort where she worked with Dr. Christian Schroder de Witt in the Cooperative AI stream.
-
Satvik Golechha
In the Summer 2024 cohort, Satvik worked on neural network modularity in the interpretability stream of Nandi Schoots. Satvik is an independent AI safety researcher interested in neural network interpretability, alignment/safety, unsupervised learning, and deep learning theory. In the past, he has worked on fundamental AI research at Microsoft Research and applied ML research for healthcare at Wadhwani AI. On the side, he enjoys writing fiction and poetry. For more, here's his website: https://7vik.io.
-
Sunishchal Dev
Dev participated in the MATS Summer 2024 Cohort under the mentorship of Marius Hobbhahn (CEO of Apollo Research), where he published a LessWrong post on evaluating the quality of benchmark datasets, with a focus on model-written safety evals. He previously worked as a data scientist doing supply chain optimization for large enterprises and got into AI safety by contributing to the METR Task Bounty program. After MATS, Dev served as a TA for ARENA and joined the RAND TASP Fellowship, focusing on dangerous AI capability evaluations.