Empirical

Streams in this track include hands-on research using machine learning experiments to understand and improve model safety including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness. This is the largest track in the program and is defined by its methods rather than any single research agenda. If your primary tool is ML engineering, this is your track.

Apply by June 7th

Application process

Initial application: No track-specific questions.
Stage 2: Complete 1–2 assessments evaluating research taste and technical implementation skills.
Stream applications & follow-up: Apply to individual streams; follow-up includes interviews or additional assessments depending on the stream.

Empirical track overview

The track is defined by its methodology more than by any single research agenda. Fellows run ML experiments to understand and improve the safety properties of frontier models, with work spanning interpretability, AI control, scalable oversight, evaluations, red-teaming, robustness, and model organisms of misalignment. The unifying thread is that progress comes from getting hands on real models (training, probing, fine-tuning, measuring) rather than reasoning from first principles alone. This is the largest track in the program and the most common entry point into technical AI safety research.

We are looking for fellows whose primary tool is ML engineering, broadly construed. The essential requirement is the ability to design and run experiments on language models or other deep learning systems and iterate quickly on the results. In practice that usually means strong Python (with and without AI coding tools), comfort with the infrastructure around running models at moderate scale, and enough research taste to know which experiments are worth running. Mission alignment matters: fellows should be able to say why a given line of empirical work meaningfully reduces frontier risk, not just whether it yields a successful publication. Educational background and seniority are weighted lightly here relative to other tracks. Past cohorts have included strong fellows ranging from undergraduates to senior industry researchers.

Fellows are matched to mentors based on fit, and projects are scoped to produce concrete artifacts by program end: papers, evaluation suites, open-source tooling, or technical reports. Target audiences include safety and alignment teams at frontier labs, governments and other evaluation organizations, the broader ML research community.

Empirical track streams

Keri Warr

Empirical

Implementing SL4/5 and searching for differentially defense-favored security tools.

Krishnamurthy Dvijotham (Dj)

Empirical

Theory

This stream will pursue research on securing and hardening AI systems through rigorous testing, provable defenses, and formal specification, including improving benchmarks for agentic security, scaling mathematically-grounded robustness techniques like randomized smoothing and Lipschitz-constrained training, and developing formal methods for specifying safe agent behaviors.

LawZero

Empirical

We are excited to supervise projects that fall within the two following categories:

Studying the causes, implications, and mitigations of [instances of] situational awareness;
Contributing directly to LawZero's Scientist AI.

For 1., we are particularly interested in:

Evaluation / monitorability awareness;
Self-awareness, in an introspective sense.

For 2., we are especially interested in:

Testing if "truth-ification" (a process that, given a corpus of text, augments it so as to make sources of information explicit) allows language models to generalize better;
Developing amortized inference methods to estimate the uncertainty of a predictor (such as an autoregressive model).

Lee Sharkey

Empirical

Theory

Lee's stream will focus primarily on improving mechanistic interpretability methods for reverse-engineering neural networks.

Luca Righetti, Seth Donoughe

Empirical

This stream will work on projects that empirically assess national security threats of AI misuse (CBRN terrorism and cyberattacks) and improve dangerous capability evaluations. Threat modeling applicants should have a skeptical mindset, enjoy case study work, and be strong written communicators. Eval applicants should be able and excited to help demonstrate concepts like sandbagging elicitation gaps in an AI misuse context.