AI & ML Paradigm Challenge

It turns out all those expensive algorithms we use to pick the 'perfect' data are a waste—just throwing darts at a map works exactly as well.

April 6, 2026

Original Paper

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh

arXiv · 2604.02766

The Takeaway

It challenges the assumption that 'smarter' data selection is always better, showing that current active selection methods may add costs without real benefits. This could lead to much cheaper and simpler ways to fine-tune massive models.

From the abstract

Modern LLMs inherit strong priors from web-scale pretraining, which can limit the headroom of post-training data-selection strategies. While Active Preference Learning (APL) seeks to optimize query efficiency in online Direct Preference Optimization (DPO), the inherent richness of on-policy candidate pools often renders simple Random sampling a surprisingly formidable baseline. We evaluate uncertainty-based APL against Random across harmlessness, helpfulness, and instruction-following settings,

Read the original paper →

← Back to today's papers