Introduces ReinPatch, the first framework to jointly optimize sequence tokenization and backbone models using reinforcement learning.
March 30, 2026
Original Paper
Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer
arXiv · 2603.26097
The Takeaway
Traditional patching (tokenization) for long-horizon data is usually fixed or heuristic-based; this method allows models to learn data-adaptive, variable-sized representations end-to-end. This is particularly impactful for time-series and long-sequence modeling where optimal data aggregation is critical for both performance and compute efficiency.
From the abstract
Efficiently aggregating spatial or temporal horizons to acquire compact representations has become a unifying principle in modern deep learning models, yet learning data-adaptive representations for long-horizon sequence data, especially continuous sequences like time series, remains an open challenge. While fixed-size patching has improved scalability and performance, discovering variable-sized, data-driven patches end-to-end often forces models to rely on soft discretization, specific backbone