Stream overview

I'm interested in mentoring projects in several directions: 

  1. Safety or alignment pretraining: There is a large variety of topics in this direction, including ideas in generating synthetic pretraining data to help with improving robustness or alignment with human values, compliance with safety policies. 
  2. Data poisoning: As synthetic data becomes an increasingly large portion of pretraining corpora, new and more subtle forms of data poisoning become possible. I'm interested in projects that develop methods for detecting, characterizing, and defending novel forms of such attacks.
  3. The role of harmful data in building safer models: There is growing evidence that retaining some harmful content during pretraining, rather than filtering it all out, can improve a model's ability to reason about harms and ultimately become safer after post-training. I'd like to mentor work that deepens our understanding of this phenomenon: when does exposure to harmful data help vs. hurt, and how can we design pretraining pipelines that leverage this insight responsibly?

Mentors

Dylan Sam
OpenAI
,
Member of Technical Staff
SF Bay Area
No items found.

Dylan is a safety researcher at OpenAI, where he works on curating better/safer training data and monitoring models for harmful behavior.

Before that he completed a PhD in the Machine Learning Department at CMU.

Read more

Mentorship style

Fellows we are looking for

Project selection