This stream focuses on critical challenges in AI safety and alignment, including risks from automating AI research, bottlenecks to recursive self-improvement, and the automation of safety and alignment research. Priority topics also include AGI privacy, measuring long-horizon agentic capabilities, developing new alignment methods, and advancing the science of post-training.
I'm interested in all areas of AI safety and alignment, but my priority directions are:
I am a principal investigator at the ELLIS Institute Tübingen and the Max Planck Institute for Intelligent Systems, where I lead the AI Safety and Alignment group. I also serve as chapter lead for the new edition of the International AI Safety Report chaired by Prof. Yoshua Bengio. I have worked on AI safety with leading organizations in the field (OpenAI, Anthropic, UK AI Safety Institute, Center for AI Safety, Gray Swan AI). I obtained my PhD in machine learning from EPFL in 2024 advised by Prof. Nicolas Flammarion. My PhD thesis was awarded the Patrick Denantes Memorial Prize for the best thesis in the CS department of EPFL and was supported by the Google and Open Phil AI PhD Fellowships.
I usually spend at least 30 min per week in one-one-one meetings with my mentees. We can also discuss longer time slots if necessary. Besides these time slots, I try to be as responsive as possible over Slack (>2 comprehensive responses per day) and read relevant papers between weekly meetings.
I'm looking for the following skills:
No constraints here. I'm fine with both internal (i.e., within MATS) and external collaborators. I can also pair MATS scholars with PhD students in my group, if it's useful.
I would prefer to set the overall direction, but I will listen closely to scholars about their preferences within a broad direction. Converging on a particular topic is expected to be a collaborative process.