AI & ML Breaks Assumption

Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.

March 13, 2026

Original Paper

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Tae-Eun Song

arXiv · 2603.12123

The Takeaway

A zero-cost, infrastructure-free way to improve LLM reliability. It proves that the bottleneck in self-correction isn't model capability, but context contamination, changing how developers should design 'human-in-the-loop' or 'agent-review' pipelines.

From the abstract

Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversation history. We ran a controlled experiment: 30 artifacts (code, technical documents, presentation scripts) with 150 injected errors, tested under four review conditions -- same-session Self-Review (SR)