AI & ML New Capability

A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.

March 13, 2026

Original Paper

Learning Tree-Based Models with Gradient Descent

Sascha Marton

arXiv · 2603.11117

The Takeaway

Historically, trees were trained via greedy algorithms (CART/XGBoost) and couldn't be optimized with NNs. This differentiable tree approach achieves SOTA results and allows practitioners to use the interpretability of decision trees inside larger deep learning pipelines for multimodal or RL tasks.

From the abstract

Tree-based models are widely recognized for their interpretability and have proven effective in various application domains, particularly in high-stakes domains. However, learning decision trees (DTs) poses a significant challenge due to their combinatorial complexity and discrete, non-differentiable nature. As a result, traditional methods such as CART, which rely on greedy search procedures, remain the most widely used approaches. These methods make locally optimal decisions at each node, cons