You can give "sight" to an AI that’s only ever read text, proving that seeing and reading are basically the same thing to a computer.
April 3, 2026
Original Paper
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
arXiv · 2604.01833
The Takeaway
This challenges the idea that vision and language are separate skills; it turns out that learning to read builds the same mental structures needed to see. It means the logic of our world is encoded so deeply in text that it can describe the visual world perfectly.
From the abstract
The ratio of outlier parameters in language pre-training models and vision pre-training models differs significantly, making cross-modality (language and vision) inherently more challenging than cross-domain adaptation. As a result, many prior studies have focused on cross-domain transfer rather than attempting to bridge language and vision modalities, assuming that language pre-trained models are unsuitable for downstream visual tasks due to disparate parameter spaces. Contrary to this assumpti