
Independent
—
Links
Focus
AI Welfare, Scalable Oversight, Compute and Hardware
H-index
12
Stream
Adrià Garriga-Alonso
Adrià is an independent researcher focused on open-source self-alignment and self-exploration, and reproducible inference. Previously, he was a Research Scientist at FAR AI, where he reverse-engineered a recurrent neural network that plans. His previous interpretability work includes measuring progress in interpretability with InterpBench, Automatic Circuit Discovery and Causal Scrubbing. He previously worked at Redwood Research on neural network interpretability. He holds a PhD from the University of Cambridge, where he worked on Bayesian neural networks.