AI & ML New Capability

Introduces a framework for LLMs to self-improve reasoning in specific domains by autonomously mining and constructing training environments directly from the open web.

March 25, 2026

Original Paper

WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement

Fangyuan Li, Pengfei Li, Shijie Wang, Junqi Gao, Jianxing Liu, Biqing Qi, Yuqiang Li

arXiv · 2603.22352

The Takeaway

This bypasses the need for human-curated or static datasets for domain-specific RLVR, allowing models to scale reasoning capabilities by discovering their own 'learnability signals' from web data. It demonstrates significant gains (+14.79 in medicine) over standard self-evolution methods.

From the abstract

Recent progress in reinforcement learning with verifiable rewards (RLVR) offers a practical path to self-improvement of language models, but existing methods face a key trade-off: endogenous self-play can drift over iterations, while corpus-grounded approaches rely on curated data environments. We present \textbf{WIST}, a \textbf{W}eb-grounded \textbf{I}terative \textbf{S}elf-play \textbf{T}ree framework for domain-targeted reasoning improvement that learns directly from the open web without req