AI & ML Efficiency Breakthrough

Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.

March 25, 2026

Original Paper

Tiny Inference-Time Scaling with Latent Verifiers

Davide Bucciarelli, Evelyn Turri, Lorenzo Baraldi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

arXiv · 2603.22492

The Takeaway

This reduces the compute FLOPs of verification-based generation by 51%. For practitioners using 'best-of-N' sampling or verifier-guided generation, this enables scaling test-time compute with significantly less VRAM and time overhead.

From the abstract

Inference-time scaling has emerged as an effective way to improve generative models at test time by using a verifier to score and select candidate outputs. A common choice is to employ Multimodal Large Language Models (MLLMs) as verifiers, which can improve performance but introduce substantial inference-time cost. Indeed, diffusion pipelines operate in an autoencoder latent space to reduce computation, yet MLLM verifiers still require decoding candidates to pixel space and re-encoding them into