Conceptual research on deceptive alignment, designing scheming propensity evaluations and honeypots. The stream will run in person in London, with scholars working together in team(s).
Conceptual research on deceptive alignment, designing scheming propensity evaluations and honeypots. Some example directions:
During the program, we will meet once a week to go through any updates / results, and your plans for the next week. I'm also happy to comment on docs, respond on Slack, or have additional ad hoc meetings as needed.
Scholars will be working together in team(s)
I will talk through project ideas with scholars