We've built an AI that 'reads' text as pictures, completely removing the need for tokens for any language on Earth.
April 15, 2026
Original Paper
MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts
arXiv · 2604.11575
The Takeaway
MIXAR is the first generative pixel-based language model capable of handling eight different scripts without a traditional tokenizer. Most AI 'reads' text as arbitrary numbers (tokens), which creates massive bias and inefficiency for non-Western languages. By 'seeing' the actual pixels of the text, this model treats all scripts equally and avoids the 'token tax' of complex characters. This could be the future of truly global, multilingual AI. It eliminates a decade of 'tokenization' headaches and opens the door for AI that can read any script, from any culture, instantly.
From the abstract
Pixel-based language models are gaining momentum as alternatives to traditional token-based approaches, promising to circumvent tokenization challenges. However, the inherent perceptual diversity across languages poses a significant hurdle for multilingual generalization in pixel space. This paper introduces MIXAR, the first generative pixel-based language model trained on eight different languages utilizing a range of different scripts. We empirically evaluate MIXAR against previous pixel-based