Resolves a long-standing open problem in bandit theory by achieving optimal dynamic regret without knowing the number of environment switches.
March 30, 2026
Original Paper
Parameter-Free Dynamic Regret for Unconstrained Linear Bandits
arXiv · 2603.25916
The Takeaway
Achieving optimal regret in unconstrained adversarial settings without prior knowledge of environment volatility (S_T) was considered a major theoretical bottleneck. This enables more robust, self-tuning online learning algorithms.
From the abstract
We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $\boldsymbol{u}_1,\ldots,\boldsymbol{u}_T$ in $\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to optimally adapt to the number of switches $S_T = \sum_t\mathbb{I}\{\boldsymbo