AI & ML Practical Magic

AI models can now think harder and improve their own answers on the fly by spending more compute time on a specific question.

April 23, 2026

Original Paper

TEMPO: Scaling Test-time Training for Large Reasoning Models

arXiv · 2604.19295

The Takeaway

TEMPO scaling allows for test-time training where a model refines its policy while it is actually solving a problem. By interleaving this refinement with a periodic critic model, the AI continues to get better the longer it spends on a single task. This moves us away from the idea that a model's intelligence is frozen after training. It suggests that future AI will have a thinking mode where you can pay more compute to get a higher-quality result. This is the first step toward models that can adapt their reasoning depth to the difficulty of the prompt.

From the abstract

Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external calibration, the self-generated reward signal increasingly drifts as the policy model evolves, leading to both performance plateaus and diversity collapse. We propose TEMPO, a TTT fr

Read the original paper →

← Back to today's papers