AI & ML Paradigm Shift

Introduces per-token adapter routing, allowing a single sequence to dynamically utilize multiple specialized LoRA experts.

March 18, 2026

Original Paper

MoLoRA: Composable Specialization via Per-Token Adapter Routing

Shrey Shah, Justin Wagle

arXiv · 2603.15965

The Takeaway

Moves beyond per-sequence routing to allow 'composable specialization' where multimodal or mixed-task requests (e.g., code + math) use the best expert for each token. It enables smaller models (1.7B) to outperform significantly larger models (8B) through modular expertise.

From the abstract

Multi-adapter serving systems route entire sequences to a single adapter, forcing a choice when requests span multiple domains. This assumption fails in two important settings: (1) multimodal generation, where text and image tokens require different adapters within the same sequence, and (2) mixed-capability requests like "write code to solve this equation," which need expertise from multiple specialized adapters. We introduce per-token routing, which routes individual tokens to adapters based o

Read the original paper →

← Back to today's papers