Memori reduces agent token costs by 20x by replacing raw conversation history with a persistent layer of semantic triples and summaries.
March 23, 2026
Original Paper
Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents
arXiv · 2603.19935
The Takeaway
Long-term context is the primary cost driver for autonomous agents. This system treats memory as a data structuring problem rather than a window size problem, allowing agents to maintain context-aware behavior across many sessions for ~5% of the typical token cost.
From the abstract
As large language models (LLMs) evolve into autonomous agents, persistent memory at the API layer is essential for enabling context-aware behavior across LLMs and multi-session interactions. Existing approaches force vendor lock-in and rely on injecting large volumes of raw conversation into prompts, leading to high token costs and degraded performance.We introduce Memori, an LLM-agnostic persistent memory layer that treats memory as a data structuring problem. Its Advanced Augmentation pipeline