Dillon Plunkett (Eleos AI Research)

This is the empirical research stream of Eleos AI Research. We’re dedicated to understanding and addressing the potential wellbeing and moral status of AI systems. We are open to fellows working on a broad range of topics, including LLM introspection, LLM preferences, persona vectors, and more, using either white-box or black-box interpretability techniques. 

Stream overview

We are open to fellows working on a broad range of topics of potential relevance to AI well-being, including: 

Example research questions might include:

  • When and why do models’ self-reported and revealed preferences diverge?
  • How do models’ self-reports differ between their immediate responses and the responses they give after extensive in-context reasoning or as their persona drifts during long conversations? 
  • Can we find interesting dimensions of variation in emotion concept vectors?

We will suggest specific projects at the start of the fellowship, but are open to fellows pitching projects aligned with our research priorities. 

Mentors

Dillon Plunkett
Eleos
,
Chief Scientist
Boston
No items found.

Dillon is the Chief Scientist at Eleos AI Research, where he leads the organization's empirical research on the sentience, moral status, and potential well-being of AI systems. Before joining Eleos, he was an Anthropic Fellow and a postdoc in the Subjectivity Lab. He did his PhD in cognitive neuroscience in Josh Greene's lab at Harvard and his BA in philosophy, also at Harvard.

Read more

Mentorship style

By default, we will meet in person for at least an hour per week. We’ll communicate regularly on Slack between meetings, and I will often be able to hop on brief calls on short-notice to discuss time-sensitive, blocking issues. 

Fellows we are looking for

Essential:

  • Experience performing empirical research
  • The ability to run and iterate on experiments quickly and creatively
  • Strong interest in understanding the potential well-being of AI systems

Strong advantages, but not strictly required:

  • Significant background knowledge of white-box and black-box ML interpretability and alignment techniques
  • Significant background in philosophy or cognitive science
  • Familiarity with existing research on AI well-being

Project selection

We’ll meet at the start of the program to discuss ideas for projects aligned with Eleos’s research priorities, including any ideas that fellows would like to pitch. We’ll work together to select a project that best fits each fellow’s goals and skills.