Enables GUI agents to overcome domain bias by autonomously 'watching' web tutorial videos to learn specific software workflows without retraining.
March 30, 2026
Original Paper
GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation
arXiv · 2603.26266
The Takeaway
This framework (GUIDE) uses a video-RAG pipeline to extract planning and grounding knowledge from existing online content and injects it into the agent at runtime. It solves a major barrier for deploying GUI agents in specialized professional software where training data is scarce.
From the abstract
Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing