Sara Price

Anthropic

Member of Technical Staff

Links

Focus

Control, Model Organisms, Red-Teaming, Scheming and Deception

Stream

Anthropic

Hi! I'm an independent AI alignment and safety researcher currently based out of the Bay Area. I've been working in machine learning since 2016 and made the switch to AI alignment work at the beginning of 2024 while participating in the MATS program.

My most recent work has focused on adversarial robustness of multimodal LLMs. We have been studying novel attacks that exploit the stochastic nature of LLM outputs in conjunction with their sensitivity to variations in continuous input spaces (i.e. audio or vision modalities).