Sarah Schwettmann, Jacob Steinhardt

We build scalable technology for AI understanding and oversight.

Stream overview

We’re building scalable, AI-backed systems for analyzing, testing, and interpreting AI agents, and using these to study behaviors like sycophancy, self-harm, and reward hacking. We’re looking for scholars who want to help us push forward this work.

Some concrete projects include: scalable, end-to-end tools for interpretability and behavior elicitation; creating robust LLM judges for Docent; scalable search and retrieval for large agent transcripts.

Mentors

Jacob Steinhardt
Transluce
,
Co-Founder, CEO
SF Bay Area
Interpretability
Monitoring
Dangerous Capability Evals

I am an Assistant Professor of Statistics and EECS at UC Berkeley, where I’m also part of BAIR and CLIMB. I am also Founder & CEO of Transluce, a non-profit research lab building open, scalable technology for understanding frontier AI systems.

Read more
Sarah Schwettmann
Transluce
,
Co-Founder, Chief Scientist
SF Bay Area
Interpretability
Monitoring
Dangerous Capability Evals

I’m a Research Scientist in MIT CSAIL with the MIT-IBM Watson AI Lab. I did my PhD in Brain and Cognitive Sciences at MIT, as an NSF Fellow working with Josh Tenenbaum and Antonio Torralba. My work investigates representations underlying intelligence in artificial (and previously, biological) neural networks.

Read more

Mentorship style

You will work closely with a mentor through recurring meetings (group and individual) and Slack.

Representative papers

https://transluce.org/pathological-behaviors 

https://transluce.org/observability-interface 

https://transluce.org/docent and https://transluce.org/introducing-docent 

Scholars we are looking for

We're looking for strong, experienced software engineers or talented researchers who can hit the ground running and iterate quickly.

ML experience is a bonus but not required.

Probably will work with collaborators from stream

Project selection

We will talk through project ideas with scholar