Team Shard

In the shard theory stream, we create qualitatively new methods and fields of inquiry, from steering vectors to gradient routing to unsupervised capability elicitation to robust unlearning. If you're theory-minded, maybe you'll help us formalize shard theory itself.

Apply

View all streams

Stream overview

Discovering qualitatively new techniques

Steering GPT-2-XL by adding an activation vector opened up a new way to cheaply steer LLMs at runtime. Subsequent work has reinforced the promise of this technique, and steering vectors have become a small research subfield of its own. Unsupervised discovery of model behaviors may now be possible thanks to Andrew Mack’s method for unsupervised steering vector discovery. Gradient routing provides a novel form of weak supervision, enabling us to induce useful structure in models even without access to perfect labels. Unlearn and Distill succeeds at robust unlearning for the same reason that prior deep unlearning methods failed.

Formalizing shard theory

Shard theory has helped unlock a range of empirical insights, including steering vectors. The time seems ripe to put the theory on firmer mathematical footing. For initial thoughts, see this comment.

Something else

We're open to supporting any conceptually motivated empirical ML research that promotes the safe development of transformative AI.

Mentors

Mentorship style

We will have weekly 1-1's and weekly team lunch, as well as asynchronous communication over Slack. Mentees are always welcome to reach out at any time, in case guidance is needed outside of usual meeting times.

Scholars should mostly figure things out on their own outside of meetings

Fellows we are looking for

Ideal candidates would have:

Academic background in machine learning, computer science, statistics, or a related quantitative field.
Familiarity with ML engineering.
Proven experience working on machine learning projects, either academically or professionally.
Strong programming skills, preferably in Python, and proficiency in data manipulation and analysis.
Ability to write up results into a paper.

Fellows will probably work with collaborators from within the stream.

Project selection

Mentor(s) will talk through project ideas with scholar

No results