Adrià Garriga-Alonso

Independent

—

Links

Focus

AI Welfare, Scalable Oversight, Compute and Hardware

H-index

Stream

Adrià Garriga-Alonso

Adrià is an independent researcher focused on open-source self-alignment and self-exploration, and reproducible inference. Previously, he was a Research Scientist at FAR AI, where he reverse-engineered a recurrent neural network that plans. His previous interpretability work includes measuring progress in interpretability with InterpBench, Automatic Circuit Discovery and Causal Scrubbing. He previously worked at Redwood Research on neural network interpretability. He holds a PhD from the University of Cambridge, where he worked on Bayesian neural networks.