AI & ML Efficiency Breakthrough

Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.

March 25, 2026

Original Paper

DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

Xiaoming Yu, Shize Tang, Guanghua Yu, Linchuan Xie, Song Liu, Jianchen Zhu, Feng Li

arXiv · 2603.22324

The Takeaway

Post-training quantization (PTQ) usually focuses on weight reconstruction, which ignores the directional fidelity of fine-tuning. DAQ allows practitioners to compress fine-tuned LLMs to FP8/INT8 without losing the specific style or capabilities gained during SFT.

From the abstract

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas ($\Delta W$) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces recons