Sarah Schwettmann, Jacob Steinhardt

We build scalable technology for AI understanding and oversight.

Stream overview

We’re building scalable, AI-backed systems for analyzing, testing, and interpreting AI agents, and using these to study behaviors like sycophancy, self-harm, and reward hacking. We’re looking for scholars who want to help us push forward this work.

Some concrete projects include: scalable, end-to-end tools for interpretability and behavior elicitation; creating robust LLM judges for Docent; scalable search and retrieval for large agent transcripts.

Mentors

Jacob Steinhardt

Transluce

Co-Founder, CEO

SF Bay Area

—

Interpretability

Monitoring

Dangerous Capability Evals

I am an Assistant Professor of Statistics and EECS at UC Berkeley, where I’m also part of BAIR and CLIMB. I am also Founder & CEO of Transluce, a non-profit research lab building open, scalable technology for understanding frontier AI systems.

Sarah Schwettmann

Transluce

Co-Founder, Chief Scientist

SF Bay Area

—

Interpretability

Monitoring

Dangerous Capability Evals

I’m a Research Scientist in MIT CSAIL with the MIT-IBM Watson AI Lab. I did my PhD in Brain and Cognitive Sciences at MIT, as an NSF Fellow working with Josh Tenenbaum and Antonio Torralba. My work investigates representations underlying intelligence in artificial (and previously, biological) neural networks.

Mentorship style

You will work closely with a mentor through recurring meetings (group and individual) and Slack.

Representative papers

https://transluce.org/pathological-behaviors

https://transluce.org/observability-interface

https://transluce.org/docent and https://transluce.org/introducing-docent

Scholars we are looking for

We're looking for strong, experienced software engineers or talented researchers who can hit the ground running and iterate quickly.

ML experience is a bonus but not required.

Probably will work with collaborators from stream

Project selection

We will talk through project ideas with scholar