AI models guess the right answer to hard math theorems 80 percent of the time but fail to prove them almost every time.
April 20, 2026
Original Paper
Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4
arXiv · 2604.15839
The Takeaway
A massive gap exists between an AI's ability to know a mathematical truth and its ability to logically prove it in Lean 4. Formal provers struggle to construct proofs even when the underlying LLM already has the correct final answer. This suggests that AI intuition develops much faster than the capacity for rigorous, step-by-step verification. The model understands the destination of a complex mathematical problem without knowing the path to get there. Bridging this gap is the next major hurdle for creating AI that can genuinely assist in scientific discovery.
From the abstract
Most ATP benchmarks embed the final answer within the formal statement -- a convention we call "Easy Mode" -- a design that simplifies the task relative to what human competitors face and may lead to optimistic estimates of model capability. We call the stricter, more realistic setting "Hard Mode": the system must independently discover the answer before constructing a formal proof. To enable Hard Mode research, we make two contributions. First, we release MiniF2F-Hard and FIMO-Hard, expert-rean