AI & ML New Capability

The first training-free framework for high-fidelity appearance transfer specifically designed for Diffusion Transformers (DiTs).

March 31, 2026

Original Paper

A training-free framework for high-fidelity appearance transfer via diffusion transformers

Shengrong Gu, Ye Wang, Song Wu, Rui Ma, Qian Wang, Lanjun Wang, Zili Yi

arXiv · 2603.26767

The Takeaway

DiTs are replacing U-Nets as the standard architecture for generative models (e.g., Sora), but they are notoriously difficult to control for reference-based editing. This method enables precise material and texture transfer at 1024px without the need for expensive fine-tuning or LoRAs.

From the abstract

Diffusion Transformers (DiTs) excel at generation, but their global self-attention makes controllable, reference-image-based editing a distinct challenge. Unlike U-Nets, naively injecting local appearance into a DiT can disrupt its holistic scene structure. We address this by proposing the first training-free framework specifically designed to tame DiTs for high-fidelity appearance transfer. Our core is a synergistic system that disentangles structure and appearance. We leverage high-fidelity in

Read the original paper →

← Back to today's papers