Introduces S0 tuning for hybrid RNN-attention models, outperforming LoRA by 10.8% with zero inference overhead.
April 2, 2026
Original Paper
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models
arXiv · 2604.01168
The Takeaway
Tuning only the initial state matrix provides a highly efficient PEFT method for hybrid models (like Mamba or GatedDeltaNet), allowing for instant task switching without weight merging or latency penalties.
From the abstract
Using roughly 48 execution-verified HumanEval training solutions, tuning a single initial state matrix per recurrent layer, with zero inference overhead, outperforms LoRA by +10.8 pp (p < 0.001) on HumanEval. The method, which we call S0 tuning, optimizes one state matrix per recurrent layer while freezing all model weights. On Qwen3.5-4B (GatedDeltaNet hybrid), S0 tuning improves greedy pass@1 by +23.6 +/- 1.7 pp (10 seeds). On FalconH1-7B (Mamba-2 hybrid), S0 reaches 71.8% +/- 1.3 and LoRA rea