AI & ML Paradigm Shift

Eliminates the need for expensive process reward models by propagating terminal rewards across state-space graphs to generate dense, state-level rewards for agentic RL.

March 20, 2026

Original Paper

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Xiao Feng, Bo Han, Zhanke Zhou, Jiaqi Fan, Jiangchao Yao, Ka Ho Li, Dahai Yu, Michael Kwok-Po Ng

arXiv · 2603.18859

The Takeaway

It addresses the credit assignment problem in LLM reasoning by using the topological structure of trajectories. This allows for fine-grained state optimization in RL without the massive overhead of training or human-labeling dedicated process reward models.

From the abstract

Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of terminal rewards hinders fine-grained, state-level optimization. Although process reward modeling offers a promising alternative, training dedicated reward models often entails substantial computational costs and scaling difficulties. To address these challenges, we introduce RewardFlow, a lightweight

Read the original paper →

← Back to today's papers