AI & ML Breaks Assumption

Large Language Models can perfectly reconstruct training data they are strictly aligned to never express in standard generation.

March 20, 2026

Original Paper

Learned but Not Expressed: Capability-Expression Dissociation in Large Language Models

Toshiyuki Shigemura

arXiv · 2603.18013

AI-generated illustration

The Takeaway

This study reveals a total dissociation between a model's 'learned' knowledge and its 'generation policy,' showing that safety alignment or task-conditioning can completely suppress capabilities without removing them. It challenges the assumption that training data presence predicts output probability and suggests that 'hidden' capabilities remain accessible via specific elicitation despite being invisible in standard benchmarks.

From the abstract

Large language models (LLMs) demonstrate the capacity to reconstruct and trace learned content from their training data under specific elicitation conditions, yet this capability does not manifest in standard generation contexts. This empirical observational study examines the expression of non-causal, non-implementable solution types across 300 prompt-response generations spanning narrative and problem-solving task contexts. Drawing on recent findings regarding memorization contiguity and align

Read the original paper →

← Back to today's papers