Paradigm Challenge / AI

Concepts inside an AI model are shaped like cylinders rather than straight lines, which is why nudging the model often leads it off-track.

The Takeaway

Most researchers assume that a concept like truthfulness exists as a simple direction in the model brain. This paper proves that these concepts are actually more like tubes or cylinders. If you try to push the model along a line, you quickly exit the cylinder and the model performance collapses. This geometric reality explains why simple steering vectors often fail to produce stable results. Understanding this cylindrical shape allows engineers to build much more effective tools for controlling AI behavior.

By SeriesFusion Editorial Board · May 5, 2026

Original Paper

The Cylindrical Representation Hypothesis for Language Model Steering

Lang Gao, Jinghui Zhang, Wei Liu, Fengxian Ji, Chenxi Wang, Zirui Song, Akash Ghosh, Youssef Mohamed, Preslav Nakov, Xiuying Chen

arXiv · 2605.01844

From the abstract

Steering is a widely used technique for controlling large language models, yet its effects are often unstable and hard to predict. Existing theoretical accounts are largely based on the Linear Representation Hypothesis (LRH). While LRH assumes that concepts can be orthogonalized for lossless control, this idealized mapping fails in real representations and cannot account for the observed unpredictability of steering. By relaxing LRH's orthogonality assumption while preserving linear representati

Read the original paper →

← Back to today's papers