Introduces an adversarial co-evolution framework where Code and Test LLMs optimize against each other to improve code generation.
March 17, 2026
Original Paper
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
arXiv · 2603.15611
The Takeaway
Moves beyond SFT on static datasets toward a self-play paradigm for code. By letting a Test LLM 'white-box' inspect code to find bugs, it eliminates self-collusion and produces models that match or exceed those trained on human-annotated tests.
From the abstract
Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent self-play methods unify code and test generation in a single model, but face a inherent dilemma: white-box access leads to self-collusion where the model produces trivial tests for easy rewards, yet black-box restriction yields generic tests that miss impleme