Roger Grosse

Roger Grosse’s stream investigates how to improve influence functions and other training data attribution methods, and uses these tools to study alignment-related phenomena such as out-of-context reasoning and emergent misalignment. The ideal scholar has experience with LLM internals, strong statistics/applied math skills (especially numerical linear algebra), and can independently drive research from literature review through experimentation and analysis. Roger provides shovel-ready projects while giving exceptional scholars freedom to pursue their own ideas, and is open to scholars collaborating with others.

Apply

View all streams

Stream overview

Ways to improve influence functions and/or other training data attribution methods, and/or to use training data attribution to understand alignment-related phenomena such as out-of-context reasoning or emergent misalignment.

Mentors

Roger Grosse

Anthropic

Associate Professor

Toronto

—

Interpretability

Mentorship style

I will meet with scholars 1 hour per week by default, and will be available to answer questions on Slack roughly daily.

Fellows we are looking for

Experience working with LLM model internals
Strong background in statistics and/or applied math (esp. numerical linear algebra)
Ability to carry out research independently on the timescale of weeks (reading the literature, formulating and carrying out experiments, interpreting results)
Ability and willingness to dig into details to get at the root causes of phenomena

Scholars are welcome to find collaborators if they'd find it valuable.

Project selection

I will give the scholar the level of freedom they are ready for. I will be prepared with focused, shovel-ready projects, but exceptional scholars with a vision they are excited about will have the flexibility to pursue it.