AI & ML Efficiency Breakthrough

Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.

March 13, 2026

Original Paper

PicoSAM3: Real-Time In-Sensor Region-of-Interest Segmentation

Pietro Bonazzi, Nicola Farronato, Stefan Zihlmann, Haotong Qin, Michele Magno

arXiv · 2603.11917

The Takeaway

It brings the capability of the Segment Anything Model (SAM) directly into vision sensors like the Sony IMX500 with sub-12ms latency. This democratizes high-quality segmentation for power-constrained IoT devices and smart glasses.

From the abstract

Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications such as smart glasses and Internet-of-Things devices. We introduce PicoSAM3, a lightweight promptable visual segmentation model optimized for edge and in-sensor execution, including deployment on the Sony IMX500 vision sensor. PicoSAM3 has 1.3 M parameters and combines a dense CNN architecture with region of interest prompt encoding, Efficient Channel Attention, and knowledge distillation from SAM2