AI & ML Efficiency Breakthrough

Introduces S0 tuning for hybrid RNN-attention models, outperforming LoRA by 10.8% with zero inference overhead.

April 2, 2026

Original Paper

S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models

Jack Young

arXiv · 2604.01168

The Takeaway

Tuning only the initial state matrix provides a highly efficient PEFT method for hybrid models (like Mamba or GatedDeltaNet), allowing for instant task switching without weight merging or latency penalties.

From the abstract

Using roughly 48 execution-verified HumanEval training solutions, tuning a single initial state matrix per recurrent layer, with zero inference overhead, outperforms LoRA by +10.8 pp (p < 0.001) on HumanEval. The method, which we call S0 tuning, optimizes one state matrix per recurrent layer while freezing all model weights. On Qwen3.5-4B (GatedDeltaNet hybrid), S0 tuning improves greedy pass@1 by +23.6 +/- 1.7 pp (10 seeds). On FalconH1-7B (Mamba-2 hybrid), S0 reaches 71.8% +/- 1.3 and LoRA rea

Read the original paper →

← Back to today's papers