Jack Lindsey

Anthropic

—

Member of Technical Staff

Links

Focus

Control, Model Organisms, Red-Teaming, Scheming and Deception

Stream

Anthropic

Hi, I'm Jack! I'm interested in understanding the cognition of modern language models, so that we can make them more reliable and aligned with human values. Currently, I lead the "Model Psych" team at Anthropic. We study the internal basis of higher-level cognitive phenomena in LLMs, like introspection, situational awareness, personas, and representations of emotion. We apply these techniques to audit Anthropic’s production models, for instance by monitoring their neural activity for signatures of deception, manipulation, or awareness of being evaluated. Previously, I did my PhD in the Center for Theoretical Neuroscience at Columbia University. For a list of my publications, see my Google Scholar profile.