OpenAI Control & Oversight

We are interested in AI control and scalable oversight. I'm excited to work with scholars interested in empirical projects building and evaluating control measures and oversight techniques for LLM agents, especially those based on chain of thought monitoring. I'm also interested in the science of chain of thought monitorability, misalignment and control. An ideal project ends with a paper submitted to NeurIPS/ICML/ICLR.

Apply

View all streams

Stream overview

Some example projects:

Forcing AI agents to externalize their cognition, either through information bottlenecks in multi-agent architectures or training techniques
Training AI agents to be more honest
Training AI agents to be have more monitorable chain of thought traces
Agentic monitors: ones that take actions, retrieve additional context and interrogate overseen agents

Mentors

Tomek Korbak

OpenAI

Member of Technical Staff

SF Bay Area

—

Control

Monitoring

I’m a Member of Technical Staff at OpenAI working on monitoring LLM agents for misalignment. Previously, I worked on AI control and safety cases at the UK AI Security Institute and on honesty post-training at Anthropic. Before that, I did a PhD at the University of Sussex with Chris Buckley and Anil Seth focusing on RL from human feedback (RLHF) and spent time as a visiting researcher at NYU working with Ethan Perez, Sam Bowman and Kyunghyun Cho.

Micah Carroll

OpenAI

Member of Technical Staff (Safety Oversight Research)

SF Bay Area

—

Scalable Oversight

Control

Monitoring

Interpretability

Adversarial Robustness

Micah is a researcher on OpenAI’s safety team interested in AI deception, scalable oversight, and monitorability. He is on leave from a UC Berkeley PhD focused on AI alignment with influenceable humans, AI manipulation from RL training, and recommender-system effects.

Miles Wang

OpenAI

Member of Technical Staff

SF Bay Area

—

Control

Model Organisms

Red-Teaming

Scheming and Deception

Mentorship style

I'll meet with mentees once a week and will be available on Slack daily. By default, I'll try to pair mentees to work on a project together.

I'll meet with mentees once a week and will be available on Slack daily.

Fellows we are looking for

An ideal mentee has a strong AI research and/or software engineering background. A mentee can be a PhD student and they can work on a paper that will be part of their thesis.

By default, I'll try to pair mentees to work on a project together.

Project selection

I'll talk through project ideas with scholar