AI & ML Efficiency Breakthrough

Hydra unifies ColBERT-style retrieval and autoregressive generation into a single Vision-Language Model using a single LoRA adapter.

March 31, 2026

Original Paper

Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model

Athos Georgiou

arXiv · 2603.28554

The Takeaway

It reduces GPU memory by 41% by eliminating the need for separate retrieval and generation models, while achieving 100% byte-identical generation quality compared to standalone base models.

From the abstract

Visual document understanding typically requires separate retrieval and generation models, doubling memory and system complexity. We present Hydra, a dual-head approach that provides both ColBERT-style late-interaction retrieval and autoregressive generation from a single vision-language model (VLM). A single LoRA adapter, trained only for retrieval, is toggled at inference: enabling it produces multi-vector embeddings; disabling it recovers the base model's generation quality -- byte-identical