AI & ML New Capability

DRTriton uses large-scale synthetic data and curriculum RL to automatically generate highly optimized Triton kernels, significantly outperforming top-tier LLMs.

March 24, 2026

Original Paper

DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation

Siqi Guo, Ming Lin, Tianbao Yang

arXiv · 2603.21465

The Takeaway

It automates the highly specialized task of writing CUDA-efficient kernels, achieving speedups on 92% of benchmark cases. This significantly lowers the barrier for engineers to optimize model execution layers without deep manual CUDA expertise.

From the abstract

Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent researches leverage Large Language Models (LLMs) to automatically convert PyTorch reference implementations to CUDA kernels, significantly reducing the engineering efforts. State-of-the-art LLMs, such as GPT-5.2 and Claude-Sonnet-4.5, still struggle in this specific task. To address this challenge, we propose DRTriton, a scalable learning framework for training LLMs to convert PyTorch co

Read the original paper →

← Back to today's papers