Multimodal models often trick people into thinking they can read circuit diagrams when they are actually just guessing from the text labels.
Visual reasoning is a highly sought-after capability for technical AI applications. This paper exposes the mirage effect, where models ignore the actual diagram and perform a sophisticated text-based autocomplete. Even when the image is wrong, the model provides the correct code by relying on the header information. This hidden failure mode means we are overestimating how much AI understands technical drawings. True grounding in visual data remains a significant hurdle for automated engineering. We cannot trust an AI to design hardware until we can prove it is actually looking at the plans.
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
arXiv · 2604.27969
Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be viewed as a visual domain-specific language for hardware: it encodes timing, topology, and bit level semantics that are invisible to casual inspection yet safety critical once fabricated in silicon. Translating such diagrams into register-transfer-level(RTL) code therefore represents an extreme reliabil