AI & ML Paradigm Shift

Knowledge distillation can be performed by injecting 'experience' into prompts rather than updating model weights.

March 31, 2026

Original Paper

TED: Training-Free Experience Distillation for Multimodal Reasoning

Shuozhi Yuan, Jinqing Wang, Zihao Liu, Miaomiao Yuan, Haoran Peng, Jin Zhao, Bingwen Wang, Haoyi Wang

arXiv · 2603.26778

The Takeaway

TED achieves significant performance gains (e.g., +7.5% on MathVision) by refining reasoning patterns into in-context experiences. This allows for 'live' distillation in resource-constrained environments where parameter updates are impossible or too expensive.

From the abstract

Knowledge distillation is typically realized by transferring a teacher model's knowledge into a student's parameters through supervised or reinforcement-based optimization. While effective, such approaches require repeated parameter updates and large-scale training data, limiting their applicability in resource-constrained environments. In this work, we propose TED, a training-free, context-based distillation framework that shifts the update target of distillation from model parameters to an in-