Tomek Korbak

I mostly interested in AI control and scalable oversight. I'm excited to work with scholars interested in empirical projects building and evaluating control measures and oversight techniques for LLM agents, especially those based on chain of thought monitoring. I'm also interested in the science of chain of thought monitorability, misalignment and control. An ideal project ends with a paper submitted to NeurIPS/ICML/ICLR.

Stream overview

Some example projects:

  1. Forcing AI agents to externalize their cognition, either through information bottlenecks in multi-agent architectures or training techniques
  2. Training AI agents to be more honest
  3. Training AI agents to be have more monitorable chain of thought traces
  4. Agentic monitors: ones that take actions, retrieve additional context and interrogate overseen agents

Mentors

Miles Wang
OpenAI
,
Member of Technical Staff
SF Bay Area
Control
Model Organisms
Red-Teaming
Scheming and Deception
Read more

I'll meet with mentees once a week and will be available on Slack daily. By default, I'll try to pair mentees to work on a project together.

Mentorship style

I'll meet with mentees once a week and will be available on Slack daily. 

Fellows we are looking for

An ideal mentee has a strong AI research and/or software engineering background. A mentee can be a PhD student and they can work on a paper that will be part of their thesis.

By default, I'll try to pair mentees to work on a project together.

Project selection

I'll talk through project ideas with scholar