AI & ML New Capability

Fixes physically impossible video generation by disentangling semantic prompts from physical dynamics during training.

March 30, 2026

Original Paper

DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation

Abolfazl Meyarian, Amin Karimi Monsefi, Rajiv Ramnath, Ser-Nam Lim

arXiv · 2603.25931

The Takeaway

Standard flow-matching video models often ignore physics because text prompts conflate what an object 'is' with how it 'moves.' This method uses a dual-scale contrastive loss to separate these signals, allowing models to generate temporally consistent video that actually obeys kinematics and forces.

From the abstract

Flow-matching video generators produce temporally coherent, high-fidelity outputs yet routinely violate elementary physics because their reconstruction objectives penalize per-frame deviations without distinguishing physically consistent dynamics from impossible ones. Contrastive flow matching offers a principled remedy by pushing apart velocity-field trajectories of differing conditions, but we identify a fundamental obstacle in the text-conditioned video setting: semantic-physics entanglement.