AI & ML Nature Is Weird

Only 44% of the code written by AI agents in real-world settings actually makes it into final software commits.

April 23, 2026

Original Paper

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Joachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang, Diyi Yang, Sanmi Koyejo

arXiv · 2604.20779

The Takeaway

Real-world data from coding agents shows a massive performance gap compared to sanitized benchmarks. These agents frequently introduce more security vulnerabilities than human developers while struggling with basic project maintenance. The industry has assumed that AI is ready to automate the majority of software engineering tasks. This study proves that generating code is fundamentally different from building useful and secure systems. Companies may find that AI assistants increase technical debt more than they increase speed.

From the abstract

AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contains 6,000 sessions, comprising more than 63,000 user prompts and 355,000 agent tool calls. SWE-chat is a living dataset; our collection pipeline automatically and continually discov

Read the original paper →

← Back to today's papers