GSB-PPO lifts proximal policy optimization from discrete action steps to full generation trajectories by framing it as a Generalized Schrödinger Bridge.
March 24, 2026
Original Paper
Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective
arXiv · 2603.21621
The Takeaway
This provides a unified mathematical framework for training generative policies (like diffusion or flow-based models) using on-policy RL. It solves the mismatch between PPO’s action-space ratios and the path-space nature of modern generative processes.
From the abstract
On-policy reinforcement learning with generative policies is promising but remains underexplored. A central challenge is that proximal policy optimization (PPO) is traditionally formulated in terms of action-space probability ratios, whereas diffusion- and flow-based policies are more naturally represented as trajectory-level generative processes. In this work, we propose GSB-PPO, a path-space formulation of generative PPO inspired by the Generalized Schrödinger Bridge (GSB). Our framework lifts