AI & ML Breaks Assumption

Exposes 'hidden clones' in VLM ensembles, where models from the same family share correlated errors that naive voting mechanisms fail to detect.

March 19, 2026

Original Paper

Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles

Zacharie Bugaud

arXiv · 2603.17111

The Takeaway

The paper shows that VLM ensemble diversity is an illusion, with effective voter counts often as low as 2.5 regardless of total models. It provides 'family-aware' voting algorithms that recover significant accuracy by accounting for architectural heritage.

From the abstract

Ensembling Vision-Language Models (VLMs) from different providers maximizes benchmark accuracy, yet models from the same architectural family share correlated errors that standard voting ignores. We study this structure across 17 VLMs from 8 families on VQAv2, TextVQA, and GQA. Family-correlated errors reduce effective ensemble dimensionality to 2.5-3.6 independent voters and create a Misleading tier (1.5-6.5% of questions) where correlated majority errors destroy accuracy to 0% despite the best