AI & ML Paradigm Shift

FIM-Merging provides a theoretical framework for layer-adaptive model merging using the Fisher Information Matrix to bound merging error.

March 24, 2026

Original Paper

Data-Free Layer-Adaptive Merging via Fisher Information for Long-to-Short Reasoning LLMs

Tian Xia

arXiv · 2603.21705

The Takeaway

Current model merging (like Task Arithmetic) is largely heuristic; this method uses FIM to calculate optimal per-layer coefficients using only random tokens (no calibration data). It achieves significant gains (+6.2 points on MATH500) for reasoning models.

From the abstract

Model merging has emerged as a practical approach to combine capabilities of specialized large language models (LLMs) without additional training. In the Long-to-Short (L2S) scenario, merging a base model with a long-chain-of-thought reasoning model aims to preserve reasoning accuracy while reducing output length. Existing methods rely on Task Arithmetic and its variants, which implicitly assume that model outputs vary linearly with the merging coefficient -- an assumption we show is systematica

Read the original paper →

← Back to today's papers