AI & ML Breaks Assumption

LLMpedia exposes a massive gap in LLM factuality by generating 1M articles from parametric memory, revealing that actual knowledge retrieval is 15%+ lower than multiple-choice benchmarks suggest.

March 26, 2026

Original Paper

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

Muhammed Saeed, Simon Razniewski

arXiv · 2603.24080

The Takeaway

It challenges the 'factuality saturation' suggested by high MMLU scores, showing that frontier models still fail significantly when forced to generate content rather than select it. The project provides the first fully open parametric encyclopedia, bridging the gap between benchmark evaluation and real-world knowledge materialization.

From the abstract

Benchmarks such as MMLU suggest flagship language models approach factuality saturation, with scores above 90\%. We show this picture is incomplete. \emph{LLMpedia} generates encyclopedic articles entirely from parametric memory, producing ${\sim}$1M articles across three model families without retrieval. For gpt-5-mini, the verifiable true rate on Wikipedia-covered subjects is only 74.7\% -- more than 15 percentage points below the benchmark-based picture, consistent with the availability bias

Read the original paper →

← Back to today's papers