AI & ML New Capability

TTA-Vid enables video reasoning models to adapt to new domains at test-time using label-free reinforcement learning on a single sample.

April 2, 2026

Original Paper

TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning

Soumya Shamarao Jahagirdar, Edson Araujo, Anna Kukleva, M. Jehanzeb Mirza, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Rogerio Feris, James R. Glass, Hilde Kuehne

arXiv · 2604.00696

The Takeaway

It eliminates the need for large-scale supervised fine-tuning when deploying models to new video distributions. The use of batch-aware frequency rewards as pseudo-ground truth allows models to generalize across datasets entirely during inference.

From the abstract

Recent video reasoning models have shown strong results on temporal and multimodal understanding, yet they depend on large-scale supervised data and multi-stage training pipelines, making them costly to train and difficult to adapt to new domains. In this work, we leverage the paradigm of Test-Time Reinforcement Learning on video-language data to allow for adapting a pretrained model to incoming video samples at test-time without explicit labels. The proposed test-time adaptation for video appro

Read the original paper →

← Back to today's papers