AI & ML Efficiency Breakthrough

Outperforms fine-tuned baselines in code optimization by using semantics-preserving transformations as a generative intermediate representation.

March 17, 2026

Original Paper

SemRep: Generative Code Representation Learning with Code Transformations

Weichen Li, Jiamin Song, Bogdan Alexandru Stoica, Arav Dhoot, Gabriel Ryan, Shengyu Fu, Kexin Pei

arXiv · 2603.13640

The Takeaway

By training models to predict code transformations rather than just raw tokens, it achieves 6.7x better robustness and matches the performance of models 685B parameters larger while using 25% less inference compute. This is a significant step toward making specialized coding agents efficient enough for local deployment.

From the abstract

Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation. Existing approaches treat code transformation as an end-to-end learning task, leaving the construction of the representation needed for semantic reasoning implicit in model weights or relying on rigid compiler-level abstractions. We present SemRep, a framework