The math used to save money on data labeling is fundamentally broken for small-scale language tasks.
April 15, 2026
Original Paper
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples
arXiv · 2604.08977
The Takeaway
Active Learning relies on the idea that choosing 'informative' samples yields better models faster. This paper shows that for translation tasks with 100-500 samples, there is zero correlation between sample informativeness and actual test performance. The standard 'diversity' metrics we use to pick data are effectively noise at this scale. This means for small-scale NLP, the compute cost of active selection is a total waste of resources. You might as well pick samples at random; you'll get the same performance for free.
From the abstract
Active learning (AL) is a training paradigm for selecting unlabeled samples for annotation to improve model performance on a test set, which is useful when only a limited number of samples can be annotated. These algorithms often work by optimizing for the informativeness and diversity of the training data to be annotated. Recent work found that AL strategies fail to outperform random sampling on various language generation tasks when using 100-500 samples. To understand AL's poor performance wh