AI & ML Paradigm Shift

Decouples perceptual failures from logical errors in Vision-Language reward models to enable more reliable test-time scaling.

March 18, 2026

Original Paper

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Junxin Wang, Dai Guan, Weijie Qiu, Zhihang Li, Yongbo Gai, Zhengyi Yang, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

arXiv · 2603.16253

The Takeaway

Standard VL-PRMs often penalize correct reasoning because they misinterpret the image (perceptual uncertainty). This framework uses explicit visual premise verification to gate rewards, significantly improving the reliability of Best-of-N reranking for complex multimodal reasoning tasks.

From the abstract

Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box judges: a low step score may reflect a genuine reasoning mistake or simply the verifier's misperception of the image. This entanglement between perception and reasoning leads to systematic false positives (rewarding hallucinated visual premises) and false negatives (penalizing correct grounded statemen