Self-Routing removes the need for learned routers in Mixture-of-Experts (MoE) by using hidden states directly for expert assignment.
April 2, 2026
Original Paper
Self-Routing: Parameter-Free Expert Routing from Hidden States
arXiv · 2604.00421
The Takeaway
Eliminates all dedicated routing parameters while improving expert utilization entropy. This suggests that MoE routing is an emergent property of hidden representations, simplifying MoE architecture and training.
From the abstract
Mixture-of-Experts (MoE) layers increase model capacity by activating only a small subset of experts per token, and typically rely on a learned router to map hidden states to expert assignments. In this work, we ask whether a dedicated learned router is strictly necessary in the MoE settings we study. We propose Self-Routing, a parameter-free routing mechanism that uses a designated subspace of the token hidden state directly as expert logits, eliminating the router projection entirely while lea