AI & ML Efficiency Breakthrough

DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.

March 13, 2026

Original Paper

Detect Anything in Real Time: From Single-Prompt Segmentation to Multi-Class Detection

Mehmet Kerem Turkcan

arXiv · 2603.11441

The Takeaway

It exploits the class-agnostic nature of visual backbones to share computation across multiple text prompts. This allows 'detect anything' models, previously too slow for production, to run at 15+ FPS on consumer hardware while maintaining SOTA accuracy.

From the abstract

Recent advances in vision-language modeling have produced promptable detection and segmentation systems that accept arbitrary natural language queries at inference time. Among these, SAM3 achieves state-of-the-art accuracy by combining a ViT-H/14 backbone with cross-modal transformer decoding and learned object queries. However, SAM3 processes a single text prompt per forward pass. Detecting N categories requires N independent executions, each dominated by the 439M-parameter backbone. We present