Changing just a few words in a prompt can make a perfectly functioning AI forget how to follow its own rules.
Large language models frequently fail to maintain specific output formats when a prompt is paraphrased, even if the meaning stays exactly the same. This output-mode collapse happens even at zero temperature, meaning it is a fundamental flaw in how the model processes language structure. A model might provide a single label perfectly until the prompt is slightly reworded, at which point it breaks character and starts rambling. This instability creates a massive headache for developers building automated systems that rely on consistent JSON or CSV responses. It proves that AI reliability is much more fragile than we think, depending on the specific phrasing of a request.
Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs
arXiv · 2605.04665
When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversatio