AI & ML Paradigm Shift

LLM-guided program evolution has discovered a new data-shuffling rule for SGD that provably and empirically outperforms standard Random Reshuffling.

April 2, 2026

Original Paper

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

Lam M. Nguyen, Dzung T. Phan, Jayant Kalagnanam

arXiv · 2604.00260

The Takeaway

The paper breaks the long-standing reliance on human-derived heuristics for stochastic optimization. By automating the discovery of 'block reshuffling' and 'paired reversal' schemes, it shows that even the most fundamental components of the training pipeline can still be optimized for better convergence and stability.

From the abstract

Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling is known to improve optimization constants relative to cyclic and shuffle-once schemes. However, existing theory offers limited guidance on how to design new data-ordering schemes that further improve optimization constants or stability beyond rand