MATS Fellow:
Patrick Leask
Authors:
Patrick Leask, Noura Al Moubayed
Citations
Abstract:
Linear probes have been used to demonstrate that LLM activations linearly encode high-level properties of the input, such as truthfulness, and that these directions can evolve significantly during fine-tuning and training. However, despite their seeming simplicity, linear probes can have complex geometric interpretations, leverage spurious correlations, and lack selectivity. We present a method for decomposing linear probe directions into weighted sums of as few as 10 model activations, whilst maintaining task performance. These probes are also invariant to affine transformations of the representation space, and we demonstrate that, in some cases, poor base to fine-tune probe generalization performance is partially due to simple transformations of representation subspaces, and the structure of the representation space changes less than indicated by other methods.
Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains
Authors:
Roy Rinberg
Date:
March 5, 2026
Citations:
0
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.