Formalizes the 'Observability Gap' to explain why coding agents plateau: humans can only provide feedback on visible outputs, while bugs reside in invisible execution states.
March 31, 2026
Original Paper
The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents
arXiv · 2603.26942
The Takeaway
This finding explains the failure of standard RLHF/feedback loops for complex autonomous agents. It suggests that unless agents expose internal execution logic to the evaluator, feedback will cause failure-mode oscillation rather than convergence, fundamentally limiting the reliability of 'black-box' agentic workflows.
From the abstract
Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds a reusable function library through lightweight human feedback on visual output alone. We evaluate this setup in a Blender-based 3D scene generation task requiring both spatial reasoning and programmatic geometric control. Although the agent rediscovered core