AI & ML New Capability

Sparse Autoencoders (SAEs) can successfully decompose opaque medical vision foundation model embeddings into human-interpretable clinical concepts.

March 26, 2026

Original Paper

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Philipp Wesp, Robbie Holland, Vasiliki Sideri-Lampretsa, Sergios Gatidis

arXiv · 2603.23794

The Takeaway

This applies state-of-the-art interpretability techniques from LLMs to high-stakes medical imaging. It allows clinicians to interrogate the 'why' behind model predictions by mapping abstract latents to language-driven clinical features with 99% dimensionality reduction.

From the abstract

Vision foundation models (FMs) achieve state-of-the-art performance in medical imaging. However, they encode information in abstract latent representations that clinicians cannot interrogate or verify. The goal of this study is to investigate Sparse Autoencoders (SAEs) for replacing opaque FM image representations with human-interpretable, sparse features. We train SAEs on embeddings from BiomedParse (biomedical) and DINOv3 (general-purpose) using 909,873 CT and MRI 2D image slices from the Tota