AI & ML Efficiency Breakthrough

Self-Routing removes the need for learned routers in Mixture-of-Experts (MoE) by using hidden states directly for expert assignment.

April 2, 2026

Original Paper

Self-Routing: Parameter-Free Expert Routing from Hidden States

Jama Hussein Mohamud, Drew Wagner, Mirco Ravanelli

arXiv · 2604.00421

The Takeaway

Eliminates all dedicated routing parameters while improving expert utilization entropy. This suggests that MoE routing is an emergent property of hidden representations, simplifying MoE architecture and training.

From the abstract

Mixture-of-Experts (MoE) layers increase model capacity by activating only a small subset of experts per token, and typically rely on a learned router to map hidden states to expert assignments. In this work, we ask whether a dedicated learned router is strictly necessary in the MoE settings we study. We propose Self-Routing, a parameter-free routing mechanism that uses a designated subspace of the token hidden state directly as expert logits, eliminating the router projection entirely while lea

Read the original paper →

← Back to today's papers