Introduces per-token adapter routing, allowing a single sequence to dynamically utilize multiple specialized LoRA experts.
March 18, 2026
Original Paper
MoLoRA: Composable Specialization via Per-Token Adapter Routing
arXiv · 2603.15965
The Takeaway
Moves beyond per-sequence routing to allow 'composable specialization' where multimodal or mixed-task requests (e.g., code + math) use the best expert for each token. It enables smaller models (1.7B) to outperform significantly larger models (8B) through modular expertise.
From the abstract
Multi-adapter serving systems route entire sequences to a single adapter, forcing a choice when requests span multiple domains. This assumption fails in two important settings: (1) multimodal generation, where text and image tokens require different adapters within the same sequence, and (2) mixed-capability requests like "write code to solve this equation," which need expertise from multiple specialized adapters. We introduce per-token routing, which routes individual tokens to adapters based o