AI & ML Efficiency Breakthrough

ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.

March 31, 2026

Original Paper

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis, Amit Roy-Chowdhury

arXiv · 2603.28053

The Takeaway

It uses vision-language models to handle common preferences and only requests human 'oracle' intervention for high-uncertainty samples. This drastically lowers the barrier to entry for training complex robotic manipulation tasks using human preferences.

From the abstract

Preference-based reinforcement learning can learn effective reward functions from comparisons, but its scalability is constrained by the high cost of oracle feedback. Lightweight vision-language embedding (VLE) models provide a cheaper alternative, but their noisy outputs limit their effectiveness as standalone reward generators. To address this challenge, we propose ROVED, a hybrid framework that combines VLE-based supervision with targeted oracle feedback. Our method uses the VLE to generate s