Scaling Insight

Scaling Insight

101 papers · Page 2 of 2

Simple Self-Distillation (SSD) improves LLM code generation (e.g., Qwen3-30B) by 13% Pass@1 without any external verifiers or teacher models.

AI & ML arxiv | Apr 2