A billion-scale time-series benchmark that identifies a 'context-length crossover' where foundation models start to crush deep learning baselines.
March 30, 2026
Original Paper
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark
arXiv · 2603.26017
The Takeaway
The benchmark reveals that for short-horizon forecasting (L=96), small deep learning models are more efficient, but foundation models scale exponentially better with context length (L>576). It provides a regime-aware map for practitioners to choose between architecture types based on data characteristics.
From the abstract
Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce \textsc{QuitoBench}, a regime-balanced benchmark for time series forecasting with coverage across eight trend$\times$seasonality$\times$forecastability (TSF) regimes, designed to capture forecasting-relevant properties rather than application-defined domain labels. The