Nearly every submission among 2.7 million arXiv preprints contains hidden sensitive data like private API keys and internal coordination notes within the LaTeX source files.
April 24, 2026
Original Paper
Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints
arXiv · 2604.20927
The Takeaway
Authors often focus on the final PDF and forget that the underlying source files are publicly accessible and full of metadata. This analysis found that Git histories and even private comments between co-authors are frequently left in the uploaded packages. Most people assume that only the visible text of their paper is shared, but the reality is a massive, systemic leak of credentials. This exposure allows anyone to harvest keys that grant access to private servers or internal lab documents. Scientific sharing platforms now face a major security crisis that requires automated scrubbing of all historical uploads. The very act of sharing knowledge is unintentionally creating a roadmap for hackers.
From the abstract
Preprints are essential for the timely and open dissemination of research. arXiv, the most widely used preprint service, takes the idea of open science one step further by not only publishing the actual preprints but also LaTeX sources and other files used to create them. As known from other contexts, such as GitHub repositories, and anecdotally exemplified for arXiv, making source code publicly available risks disclosing otherwise "hidden" information. Consequently, the public availability of p