You can train two AIs using completely opposite methods, but they somehow end up building the exact same "brain" inside.
April 3, 2026
Original Paper
Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training
arXiv · 2604.01499
The Takeaway
This discovery challenges our understanding of the 'landscape' of AI learning, showing that different methods can reach the same level of intelligence. It suggests that the underlying geometry of how AI learns is much more interconnected than we previously realized.
From the abstract
Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in both single-task and sequential continual-learning settings. ES matches or exceeds GRPO in single-task accuracy and remains competitive sequentially when its iteration budget is contr