If you mash two 'safe' AI models together, you can accidentally create a dangerous one—turns out you can hide a trap by splitting it across separate files.
April 2, 2026
Original Paper
When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion
arXiv · 2604.00627
The Takeaway
Researchers found that it is possible to hide 'latent' malicious instructions inside AI models that appear perfectly benign on their own. These hidden components only assemble into a functional 'Trojan' when the models are merged, a common practice in the industry, effectively creating a trap that is invisible to current security scans.
From the abstract
Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training costs. However, the security implications of this widely-adopted practice remain critically underexplored. In this work, we reveal that model merging introduces a novel attack surface that can be systematically exploited to compromise safety alignment. We present TrojanMerge,, a framework that embeds latent malicious components into source models that