AI & ML Efficiency Breakthrough

IsoQuant leverages SO(4) isoclinic rotations to achieve a 4.5x-4.7x speedup in low-bit KV-cache quantization over existing methods.

March 31, 2026

Original Paper

IsoQuant: Hardware-Aligned SO(4) Isoclinic Rotations for LLM KV Cache Compression

Zhongping Ji

arXiv · 2603.28430

The Takeaway

It replaces computationally expensive random orthogonal transforms with blockwise quaternion-based rotations that are highly aligned with modern GPU hardware. This significantly reduces the overhead of feature decorrelation, which is critical for maintaining accuracy in 2-bit or 3-bit LLM deployment.

From the abstract

Orthogonal feature decorrelation is effective for low-bit online vector quantization, but dense random orthogonal transforms incur prohibitive $O(d^2)$ storage and compute. RotorQuant reduces this cost with blockwise $3$D Clifford rotors, yet the resulting $3$D partition is poorly aligned with modern hardware and offers limited local mixing.We propose \textbf{IsoQuant}, a blockwise rotation framework based on quaternion algebra and the isoclinic decomposition of $SO(4)$. It represents each $4$D