Alex Turner

Google DeepMind

Research scientist

Links

Focus

Interpretability, Agent Foundations

Alex is a Research Scientist at Google DeepMind. He’s currently working on training invariants into model behavior. In the past, he formulated and proved the power-seeking theorems, co-formulated the shard theory of human value formation, and proposed the Attainable Utility Preservation approach to penalizing negative side effects.

Highlighted outputs from past streams: