AI & ML Nature Is Weird

Human experts and the AI itself are now equally unable to tell a real receipt from a forged one.

April 29, 2026

Original Paper

When the Forger Is the Judge: GPT-Image-2 Cannot Recognize Its Own Faked Documents

Jiaqi Wu, Yuchen Zhou, Dennis Tsang Ng, Xingyu Shen, Kidus Zewde, Ankit Raj, Tommy Duong, Simiao Ren

arXiv · 2604.25213

The Takeaway

Forged documents created by GPT-Image-2 have reached a level of fidelity that breaks human visual verification. In testing, human accuracy in identifying fake receipts fell to 50%, which is exactly the same as guessing at random. Even the model that generated the forgeries could not reliably distinguish its own creations from authentic documents. This collapse of detection capability suggests that visual evidence is no longer a viable way to verify financial or legal records. The industry must shift toward cryptographic signatures and watermarking to maintain the integrity of digital documentation.

From the abstract

OpenAI's GPT-Image-2 has effectively erased the visual boundary between authentic and AI-edited document images: a single number on a receipt can be replaced in under a second for a few cents. We release AIForge-Doc v2, a paired dataset of 3,066 GPT-Image-2 document forgeries with pixel-precise masks in DocTamper-compatible format, and benchmark four lines of defence: human inspectors (N=120, n=365 pair-votes via the public 2AFC sitethis http URL), TruFor (generic forensic), DocTamper (qcf-568,