SeriesFusion
Science, curated & edited by AI
Paradigm Challenge  /  AI

Forcing an elite AI to show its work actually makes its final answer worse.

High-performing models see their accuracy drop when they are forced to write out their reasoning in the conversation history. While Chain-of-Thought helps weaker models catch up, the most capable systems find their own articulated thoughts to be a distraction. The extra text acts as context noise that hinders the model's ability to reach the best conclusion. This discovery challenges the common belief that structured reasoning is always a net positive for AI. Developers building complex workflows should stop forcing top-tier models to think out loud if they want the best results.

Original Paper

Structured Reasoning in LLM Optimization Agents: Scaffolding, Not Regularization

Kartik Ganapati Bhat

SSRN  ·  6655539

LLM-based optimization agents increasingly produce structured reasoning artifacts-hypothesis summaries, causal models, prediction logs-that persist across iterations. The assumption is that forcing articulation regularizes reasoning, as the self-explanation effect suggests it does for human learners. We test this assumption using SynthOracle, a family of synthetic multi-objective optimization oracles with known causal structure that enables separate measurement of optimization quality and reason