AI & ML Nature Is Weird

Nearly every submission among 2.7 million arXiv preprints contains hidden sensitive data like private API keys and internal coordination notes within the LaTeX source files.

April 24, 2026

Original Paper

Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints

Jan Pennekamp, Johannes Lohmöller, David Schütte, Joscha Loos, Martin Henze

arXiv · 2604.20927

The Takeaway

Authors often focus on the final PDF and forget that the underlying source files are publicly accessible and full of metadata. This analysis found that Git histories and even private comments between co-authors are frequently left in the uploaded packages. Most people assume that only the visible text of their paper is shared, but the reality is a massive, systemic leak of credentials. This exposure allows anyone to harvest keys that grant access to private servers or internal lab documents. Scientific sharing platforms now face a major security crisis that requires automated scrubbing of all historical uploads. The very act of sharing knowledge is unintentionally creating a roadmap for hackers.

From the abstract

Preprints are essential for the timely and open dissemination of research. arXiv, the most widely used preprint service, takes the idea of open science one step further by not only publishing the actual preprints but also LaTeX sources and other files used to create them. As known from other contexts, such as GitHub repositories, and anecdotally exemplified for arXiv, making source code publicly available risks disclosing otherwise "hidden" information. Consequently, the public availability of p

Read the original paper →

← Back to today's papers