AI & ML New Capability

WARP provides provable, guaranteed repairs for inner layers of Transformers, overcoming the limitation of previous methods restricted to the final layer.

April 2, 2026

Original Paper

WARP: Guaranteed Inner-Layer Repair of NLP Transformers

Hsin-Ling Hsu, Min-Yu Chen, Nai-Chia Chen, Yan-Ru Chen, Yi-Ling Chang, Fang Yu

arXiv · 2604.00938

The Takeaway

Model editing and repair are usually 'best-effort' and can be easily undone by adversarial prompts. By formulating repair as a convex quadratic program, WARP ensures the model satisfies specific margin and robustness constraints on corrected samples.

From the abstract

Transformer-based NLP models remain vulnerable to adversarial perturbations, yet existing repair methods face a fundamental trade-off: gradient-based approaches offer flexibility but lack verifiability and often overfit; methods that do provide repair guarantees are restricted to the final layer or small networks, significantly limiting the parameter search space available for repair. We present WARP (Weight-Adjusted Repair with Provability), a constraint-based repair framework that extends repa