
Anthropic
—
Member of Technical Staff
Links
Focus
Control, Model Organisms, Red-Teaming, Scheming and Deception
Stream
Anthropic
Hi! I'm an independent AI alignment and safety researcher currently based out of the Bay Area. I've been working in machine learning since 2016 and made the switch to AI alignment work at the beginning of 2024 while participating in the MATS program.
My most recent work has focused on adversarial robustness of multimodal LLMs. We have been studying novel attacks that exploit the stochastic nature of LLM outputs in conjunction with their sensitivity to variations in continuous input spaces (i.e. audio or vision modalities).