A 4B parameter model matches a 120B parameter model in program verification through a rigorous data curation pipeline.
March 17, 2026
Original Paper
Not All Invariants Are Equal: Curating Training Data to Accelerate Program Verification with SLMs
arXiv · 2603.15510
The Takeaway
Demonstrates that high-quality semantic rewriting and AST-based normalization can allow Small Language Models (SLMs) to reach the performance of models 30x their size. This provides a blueprint for specialized reasoning tasks where compute efficiency is critical.
From the abstract
The synthesis of inductive loop invariants is a critical bottleneck in automated program verification. While Large Language Models (LLMs) show promise in mitigating this issue, they often fail on hard instances, generating invariants that are invalid or computationally ineffective. While fine-tuning is a natural route to mitigate this limitation, obtaining high-quality training data for invariant generation remains an open challenge. We present a rigorous data curation pipeline designed to extra