Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.
March 13, 2026
Original Paper
SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
arXiv · 2603.11746
The Takeaway
The introduction of Neighbor Forcing and structured ConvKV memory allows for infinite video generation with constant memory usage. This effectively shatters the hardware-imposed temporal limits that currently constrain video generation practitioners to short clips.
From the abstract
Autoregressive (AR) diffusion models offer a promising framework for sequential generation tasks such as video synthesis by combining diffusion modeling with causal inference. Although they support streaming generation, existing AR diffusion methods struggle to scale efficiently. In this paper, we identify two key challenges in hour-scale real-time human animation. First, most forcing strategies propagate sample-level representations with mismatched diffusion states, causing inconsistent learnin