AI & ML Paradigm Shift

A simple perturbation method reveals that representations are not just activation patterns, but conduits that determine how learning 'infects' similar examples.

March 26, 2026

Original Paper

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

Joshua Rozner, Cory Shain

arXiv · 2603.23821

The Takeaway

By fine-tuning on a single adversarial example and measuring transfer, this method bypasses the flawed assumption that representations must be linear. It offers a more robust way to trace how LMs acquire and generalize linguistic abstractions.

From the abstract

Linguistic representation learning in deep neural language models (LMs) has been studied for decades, for both practical and theoretical reasons. However, finding representations in LMs remains an unsolved problem, in part due to a dilemma between enforcing implausible constraints on representations (e.g., linearity; Arora et al. 2024) and trivializing the notion of representation altogether (Sutter et al., 2025). Here we escape this dilemma by reconceptualizing representations not as patterns o

Read the original paper →

← Back to today's papers