AI & ML Nature Is Weird

AI vision collapses if you remove textures, proving that models don't actually know what 'shapes' are.

April 14, 2026

Original Paper

BareBones: Benchmarking Zero-Shot Geometric Comprehension in VLMs

Aaditya Baranwal, Vishal Yadav, Abhishek Rajora

arXiv · 2604.10528

The Takeaway

SOTA Vision-Language Models rely on statistical texture shortcuts rather than geometric understanding. When RGB textures are stripped, their ability to recognize objects disappears, revealing that AI 'vision' is fundamentally different from human spatial awareness.

From the abstract

While Vision-Language Models (VLMs) demonstrate remarkable zero-shot recognition capabilities across a diverse spectrum of multimodal tasks, it yet remains an open question whether these architectures genuinely comprehend geometric structure or merely exploit RGB textures and contextual priors as statistical shortcuts. Existing evaluations fail to isolate this mechanism, conflating semantic reasoning with texture mapping and relying on imprecise annotations that inadvertently leak environmental