AI & ML New Capability

A self-supervised robotic system detects novel objects by training bespoke detectors on-the-fly from human video demonstrations, bypassing language-based prompts.

March 16, 2026

Original Paper

Show, Don't Tell: Detecting Novel Objects by Watching Human Videos

James Akl, Jose Nicolas Avendano Arbelaez, James Barabas, Jennifer L. Barry, Kalie Ching, Noam Eshed, Jiahui Fu, Michel Hidalgo, Andrew Hoelscher, Tushar Kusnur, Andrew Messing, Zachary Nagler, Brian Okorn, Mauro Passerino, Tim J. Perkins, Eric Rosen, Ankit Shah, Tanmay Shankar, Scott Shaw

arXiv · 2603.12751

The Takeaway

Current open-vocabulary detectors often require tedious prompt engineering to recognize specific instances. This 'Show, Don't Tell' approach allows a robot to automatically generate a tailored dataset and detector in minutes, significantly improving task completion in unconstrained environments.

From the abstract

How can a robot quickly identify and recognize new objects shown to it during a human demonstration? Existing closed-set object detectors frequently fail at this because the objects are out-of-distribution. While open-set detectors (e.g., VLMs) sometimes succeed, they often require expensive and tedious human-in-the-loop prompt engineering to uniquely recognize novel object instances. In this paper, we present a self-supervised system that eliminates the need for tedious language descriptions an