Roger Grosse’s stream investigates how to improve influence functions and other training data attribution methods, and uses these tools to study alignment-related phenomena such as out-of-context reasoning and emergent misalignment. The ideal scholar has experience with LLM internals, strong statistics/applied math skills (especially numerical linear algebra), and can independently drive research from literature review through experimentation and analysis. Roger provides shovel-ready projects while giving exceptional scholars freedom to pursue their own ideas, and is open to scholars collaborating with others.
Ways to improve influence functions and/or other training data attribution methods, and/or to use training data attribution to understand alignment-related phenomena such as out-of-context reasoning or emergent misalignment.
I will meet with scholars 1 hour per week by default, and will be available to answer questions on Slack roughly daily.
Scholars are welcome to find collaborators if they'd find it valuable.
I will give the scholar the level of freedom they are ready for. I will be prepared with focused, shovel-ready projects, but exceptional scholars with a vision they are excited about will have the flexibility to pursue it.