I want to conduct policy and governance research on "loss-of-control" risks (RSI, misalignment etc). As a first step, this probably includes rigorous threat modelling and scenarios. If by the time of the MATS program, sufficient examples have been published, we can continue from there and develop proposals for best practices to assess and mitigate associate risks.
Pegah Maham is a Policy Development and Strategy Manager within Google DeepMind’s Frontier Policy Development team, where she works at the intersection of technical AI safety and security and international governance. Her work is focused on frontier AI risks, such as biosecurity and AGI safety. Topics she is thinking about include risk assessments and mitigations, threat modelling, external testing, transparency, system integrity and model weight security.
We will meet once a week for 30 min, either in person or via a video call. Occasionally, we can have whiteboard brainstorming sessions, depending on my capacity (which is hard to predict.)
You do not need a STEM or ML background. You should understand how current LLMs are being developed, and understand related concepts such as reinforcement learning or reward hacking. You should be familiar with the existing arguments and counter-arguments around "loss-of-control".
You should bring an interest for the existing policy discourse and environment on this topic; and a willingness to consider feasibility.
We will brainstorm and re-fine together, based on policy demand and your interests and expertise.