AI & ML Efficiency Breakthrough

VideoAtlas enables navigation and reasoning over long-form video using compute that scales only logarithmically with video length.

March 19, 2026

Original Paper

VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

Mohamed Eltahir, Ali Habibullah, Yazan Alshoibi, Lama Ayash, Tanveer Hussain, Naeemullah Khan

arXiv · 2603.17948

The Takeaway

It replaces lossy text-based video summarization with a lossless hierarchical visual grid, allowing agents to 'zoom in' on evidence. This structural innovation bypasses the quadratic context window costs of standard video models, enabling truly long-context visual understanding.

From the abstract

Extending language models to video introduces two challenges: representation, where existing methods rely on lossy approximations, and long-context, where caption- or agent-based pipelines collapse video into text and lose visual fidelity. To overcome this, we introduce \textbf{VideoAtlas}, a task-agnostic environment to represent video as a hierarchical grid that is simultaneously lossless, navigable, scalable, caption- and preprocessing-free. An overview of the video is available at a glance,