
Google DeepMind
—
Research scientist
Links
Focus
Interpretability, Agent Foundations
Stream
Team Shard
Alex is a Research Scientist at Google DeepMind. He’s currently working on training invariants into model behavior. In the past, he formulated and proved the power-seeking theorems, co-formulated the shard theory of human value formation, and proposed the Attainable Utility Preservation approach to penalizing negative side effects.
Highlighted outputs from past streams: