AI & ML Efficiency Breakthrough

Enables 'Elastic Inference' where a single trained model can be converted to multiple lower-precision formats on-the-fly without retraining.

April 2, 2026

Original Paper

MF-QAT: Multi-Format Quantization-Aware Training for Elastic Inference

Zifei Xu, Sayeh Sharify, Hesham Mostafa

arXiv · 2604.00529

The Takeaway

Current QAT requires a target format at training time; this framework allows a single checkpoint to support various MXINT/MXFP precisions at runtime. This is a major win for practitioners deploying models across diverse hardware with varying precision support.

From the abstract

Quantization-aware training (QAT) is typically performed for a single target numeric format, while practical deployments often need to choose numerical precision at inference time based on hardware support or runtime constraints. We study multi-format QAT, where a single model is trained to be robust across multiple quantization formats. We find that multi-format QAT can match single-format QAT at each target precision, yielding one model that performs well overall across different formats, even