Practical Magic / AI

A new inference paradigm cuts AI reasoning costs by 400% while making the models significantly faster at solving complex problems.

The Takeaway

Hierarchical Adaptive Recursive Language Models use parallel execution to destroy the latency bottleneck of modern AI. The system routes context dynamically and uses speculative execution to find answers more efficiently. This results in an order-of-magnitude shift in how much it costs to run a high-reasoning model. Previous architectures were too expensive for wide-scale deployment in complex tasks. This breakthrough makes advanced AI reasoning cheap enough for everyday business applications.

By SeriesFusion Editorial Board · April 25, 2026

Original Paper

HARLM: Hierarchical Adaptive Recursive Language Models: Achieving Order-of-Magnitude Cost Reduction and Performance Gains Through Parallel Speculative Execution and Learned Context Routing

Lokesh Mure

SSRN · 6397658

From the abstract

I introduce Hierarchical Adaptive Recursive Language Models (HARLM), a fundamentally reimagined inference paradigm that extends and dramatically improves upon Recursive Language Models (RLMs). While RLMs demonstrated that treating prompts as external environment variables enables processing of arbitrarily long contexts, they suffer from three critical limitations: (1) synchronous sequential execution creating latency bottlenecks, (2) fixed shallow recursion depth limiting expressiveness, and (3)

Read the original paper →

← Back to today's papers