Alignment Research Center (ARC)

The Alignment Research Center is a small non-profit research group based in Berkeley, California, that is working on a systematic and theoretically grounded approach to mechanistically explaining neural network behavior. We are interested in scholars with a strong math background and mathematical maturity. If you'd be excited to work on the research direction described in this blog post – then we'd encourage you to apply!

Apply

View all streams

Stream overview

ARC will be supervising projects that fit into our technical research agenda, which is outlined on our blog. See for example this post.

Most projects will be primarily theoretical in nature. These could involve developing mechanistic estimation algorithms (such for as the expected outputs of MLPs), or coming up with counterexamples to existing algorithm proposals. There are also a variety of high-level theoretical questions about the limits of broad classes of approaches that we are interested in.

A smaller number of projects may also involve some empirical work, such as implementing proposed algorithms to check their performance, or might study more philosophical questions about how our methods could be applied to produced aligned AI systems.

Mentors

Jacob Hilton

Alignment Research Center (ARC)

Researcher

SF Bay Area

—

Interpretability

Jacob Hilton is a researcher at the Alignment Research Center (ARC), a nonprofit working on the theoretical foundations of mechanistic interpretability. He previously worked at OpenAI on reinforcement learning from human feedback, scaling laws and interpretability. His background is in pure mathematics, and he holds a PhD in set theory from the University of Leeds, UK.

Wilson Wu

ARC

Researcher

SF Bay Area

—

Interpretability

Wilson Wu is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. He has previously worked on alternate approaches to interpretability including compact proofs and applications of singular learning theory.

Victor Lecomte

ARC

Researcher

SF Bay Area

—

Interpretability

Victor Lecomte is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. He holds a PhD from Stanford University, where he did research in computational complexity and other areas of theoretical computer science before pivoting to AI safety research.

Michael Winer (Mike)

ARC

Researcher

SF Bay Area

—

Interpretability

Mike Winer is a researcher at the Alignment Research Center (ARC), where he studies how mechanistic estimates can beat black-box techniques in toy setups. His background is in statistical physics, where he studies how many objects obeying simple rules can exhibit complex behaviors like magnetism, glassiness, or scoring 87% on GPQA.

Paul Christiano

Center for AI Standards and Innovation

Technical Advisor

—

I am a technical advisor at the Center for AI Standards and Innovation within NIST. I previously ran the Alignment Research Center and the language model alignment team at OpenAI. Before that I received my PhD in statistical learning theory from UC Berkeley.

You may be interested in my writing about alignment, my blog, my academic publications, or fun and games.

Mentorship style

Scholars will work out of ARC's offices in Berkeley. Each scholar will meet with their mentor at least once a week for an hour, though 2-3 hours per week is not uncommon. Besides time with their official mentor, scholars will likely spend time working in collaboration with other researchers; a typical scholar will likely spend about 25% of their time actively collaborating or learning about others' research.

Fellows we are looking for

Essential:

Mathematical maturity and a math, physics or computer science background at the level of a strong undergraduate at a top-20 university.
Good at communicating about technical topics.
Interest in engaging with ARC's higher-level research agenda.
Potentially interested in joining ARC full-time by September 2027.

Preferred:

Ability to do productive research in the absence of formal problem statements.
Open to working at ARC from our office in Berkeley.

Optional extras:

Background in ML theory and/or theoretical CS.
Basic ML engineering experience, such as running experiments on small neural nets.

Scholars are encouraged to collaborate with anyone at ARC, including full-time researchers and other scholars/visiting researchers. Scholars are also welcome to collaborate with researchers outside of ARC, and are encouraged to do so when outside researchers have expertise that we could benefit from.

Project selection

Each scholar will be paired with the mentor that best suits their skills and interests. The mentor will discuss potential projects with the scholar, and they will decide what project makes the most sense, based on ARC's research goals and the scholar's preferences.

Most scholars will work on multiple projects over the course of their time at ARC, and some scholars will work with multiple mentors.