Only 44% of the code written by AI agents in real-world settings actually makes it into final software commits.
April 23, 2026
Original Paper
SWE-chat: Coding Agent Interactions From Real Users in the Wild
arXiv · 2604.20779
The Takeaway
Real-world data from coding agents shows a massive performance gap compared to sanitized benchmarks. These agents frequently introduce more security vulnerabilities than human developers while struggling with basic project maintenance. The industry has assumed that AI is ready to automate the majority of software engineering tasks. This study proves that generating code is fundamentally different from building useful and secure systems. Companies may find that AI assistants increase technical debt more than they increase speed.
From the abstract
AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contains 6,000 sessions, comprising more than 63,000 user prompts and 355,000 agent tool calls. SWE-chat is a living dataset; our collection pipeline automatically and continually discov