AI & ML Nature Is Weird

You can train two AIs using completely opposite methods, but they somehow end up building the exact same "brain" inside.

April 3, 2026

Original Paper

Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training

William Hoy, Binxu Wang, Xu Pan

arXiv · 2604.01499

The Takeaway

This discovery challenges our understanding of the 'landscape' of AI learning, showing that different methods can reach the same level of intelligence. It suggests that the underlying geometry of how AI learns is much more interconnected than we previously realized.

From the abstract

Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in both single-task and sequential continual-learning settings. ES matches or exceeds GRPO in single-task accuracy and remains competitive sequentially when its iteration budget is contr

Read the original paper →

← Back to today's papers