TTA-Vid enables video reasoning models to adapt to new domains at test-time using label-free reinforcement learning on a single sample.
April 2, 2026
Original Paper
TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning
arXiv · 2604.00696
The Takeaway
It eliminates the need for large-scale supervised fine-tuning when deploying models to new video distributions. The use of batch-aware frequency rewards as pseudo-ground truth allows models to generalize across datasets entirely during inference.
From the abstract
Recent video reasoning models have shown strong results on temporal and multimodal understanding, yet they depend on large-scale supervised data and multi-stage training pipelines, making them costly to train and difficult to adapt to new domains. In this work, we leverage the paradigm of Test-Time Reinforcement Learning on video-language data to allow for adapting a pretrained model to incoming video samples at test-time without explicit labels. The proposed test-time adaptation for video appro