AI & ML Practical Magic

A facial recognition model has been shrunk by 20 times to just seven megabytes while still identifying people with nearly perfect accuracy.

April 25, 2026

Original Paper

BitFace: A 1.58-bit Vision Transformer for Energy-Efficient Face Recognition on Edge Devices

SSRN · 6632123

The Takeaway

BitFace uses ternary weights consisting only of negative one, zero, and positive one to replace complex numbers. This massive compression allows a Vision Transformer to run on tiny, low-power IoT sensors that normally lack the memory for AI. Most practitioners assumed that reducing an AI's precision this much would cause its accuracy to plummet. Instead, the model remains highly effective while consuming a fraction of the energy required by standard systems. This makes high-end security and identification possible on hardware as small as a smart doorbell or a wearable badge.

From the abstract

Deploying face recognition on resource-constrained devices (smartphones, IoT sensors, embedded cameras) is hampered by the large memory and energy footprint of modern Vision Transformers (ViTs). Inspired by the recent success of 1-bit Transformers in natural language processing (BitNet), we introduce BitFace, the first ViT-based face recognition model trained from scratch with ternary weights {-1, 0, +1}, i.e. 1.58-bit precision, and 8-bit activations. Two design choices are critical for stabili