AI & ML Paradigm Challenge

If you mash two 'safe' AI models together, you can accidentally create a dangerous one—turns out you can hide a trap by splitting it across separate files.

April 2, 2026

Original Paper

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou, Yuxi Li, Tianlong Yu, Kailong Wang

arXiv · 2604.00627

The Takeaway

Researchers found that it is possible to hide 'latent' malicious instructions inside AI models that appear perfectly benign on their own. These hidden components only assemble into a functional 'Trojan' when the models are merged, a common practice in the industry, effectively creating a trap that is invisible to current security scans.

From the abstract

Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training costs. However, the security implications of this widely-adopted practice remain critically underexplored. In this work, we reveal that model merging introduces a novel attack surface that can be systematically exploited to compromise safety alignment. We present TrojanMerge,, a framework that embeds latent malicious components into source models that

Read the original paper →

← Back to today's papers