Enables stable 4-bit microscaling (MXFP4) quantization for Multi-modal LLMs, which previously suffered from performance collapse.
March 18, 2026
Original Paper
BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization
arXiv · 2603.16590
The Takeaway
MXFP4 is the emerging hardware standard for efficient inference. By using block-wise transformations that prevent outlier energy from 'bleeding' across blocks, this method recovers 96% of full-precision performance on aggressive W4A4 configurations.
From the abstract
Microscaling floating-point (MXFP) formats have emerged as a promising standard for deploying Multi-modal Large Language Models (MLLMs) and Large Language Models (LLMs) on modern accelerator architectures. However, existing Post-Training Quantization (PTQ) methods, particularly rotation-based techniques designed for integer formats, suffer from severe performance collapse when applied to MXFP4. Recent studies attribute this failure to a fundamental format mismatch: global orthogonal rotations in