Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.
March 13, 2026
Original Paper
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
arXiv · 2603.12262
The Takeaway
Existing VideoLLMs struggle with the trade-off between reasoning depth and real-time responsiveness. This paper introduces a 'thinking while watching' mechanism that amortizes reasoning latency over the video stream, allowing for sophisticated, multi-turn interaction in real-time online environments.
From the abstract
Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical reasoning stream. However, directly applying test-time scaling methods incurs unacceptable response latency. To address this trade-off, we propose Video Streaming Thinking (VST), a novel paradigm for streaming video understanding. It supports a thinking while watching mechanism, which activates reasonin