AI & ML Nature Is Weird

Top-tier AI models talk like absolute geniuses, but they lose their shirts the second you ask them to bet real money on the news.

April 10, 2026

Original Paper

Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets

Jaden Zhang, Gardenia Liu, Oliver Johansson, Hileamlak Yitayew, Kamryn Ohly, Grace Li

arXiv · 2604.07355

The Takeaway

Despite high scores on reasoning benchmarks, frontier models lost up to 30% of their capital on actual prediction markets. This reveals a massive 'reality gap' where linguistic intelligence fails to translate into the nuanced judgment required to beat human financial markets.

From the abstract

We introduce Prediction Arena, a benchmark for evaluating AI models' predictive accuracy and decision-making by enabling them to trade autonomously on live prediction markets with real capital. Unlike synthetic benchmarks, Prediction Arena tests models in environments where trades execute on actual exchanges (Kalshi and Polymarket), providing objective ground truth that cannot be gamed or overfitted. Each model operates as an independent agent starting with $10,000, making autonomous decisions e

Read the original paper →

← Back to today's papers