Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare

TL;DR
A fair, sourced comparison of the memory layers developers reach for in 2026: Mem0's extract-and-retrieve, Zep's temporal knowledge graph, Letta's self-editing agent memory, and Cloudflare's Durable Objects primitive. Architecture, pricing, the benchmark disputes, and which to pick for your agent.
Direct answer
Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare
A fair, sourced comparison of the memory layers developers reach for in 2026: Mem0's extract-and-retrieve, Zep's temporal knowledge graph, Letta's self-editing agent memory, and Cloudflare's Durable Objects primitive. Architecture, pricing, the benchmark disputes, and which to pick for your agent.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
| Official Sources | |
|---|---|
| Mem0 Docs / Pricing | Extract-and-retrieve memory layer |
| Zep / Graphiti / Pricing | Temporal knowledge graph memory |
| Letta Docs / Pricing | Stateful agents, self-editing memory |
| Cloudflare Agents | Durable Objects state primitive |
| LOCOMO paper / LongMemEval | The benchmarks everyone cites |
Last updated: July 2, 2026
Agents forget. The model that just spent twenty turns learning your codebase, your preferences, and the shape of the task wakes up the next session knowing none of it. A memory layer is the fix, and by 2026 it is a real market: you can bolt on a hosted API in an afternoon, or self-host an open-source core and own the data. The four names that come up most are Mem0, Zep, Letta, and Cloudflare's Agents primitive, and they are not four flavors of the same thing. One extracts facts and retrieves them, one builds a temporal knowledge graph, one lets the agent edit its own memory, and one is not really a memory product at all but the substrate you build memory on.
This is a fair, sourced comparison: what each actually is, how it is priced, the benchmark fights you should not take at face value, and a decision guide by workload. If you want the conceptual grounding first, AI agent memory patterns covers the categories, and why agent memory benchmarks are not enough sets up the skepticism you will need for the numbers below.
Mem0: Extract, Then Retrieve
Mem0 bills itself as a "universal memory layer for AI agents." The core-concepts docs describe a two-phase design: an extract phase that uses an LLM to pull durable facts out of a conversation, deduplicate them, and embed them, and a retrieve phase that fuses parallel scoring passes (semantic, keyword, and entity) to surface the relevant memories before the next model call. State lands in three tiers: a SQL store for facts, a vector DB for embeddings, and an entity store for relationships, with an optional graph-memory variant described in their 2025 paper. Everything is scoped by user_id, agent_id, or run_id.
It is open source under Apache 2.0 (github.com/mem0ai/mem0, roughly 59.9k stars as of this writing) and self-hostable, with a managed platform at app.mem0.ai. Hosted pricing runs from a free Hobby tier (10k memory adds, 1k retrievals monthly) through Starter at $19/mo, Growth at $79/mo, and Pro at $249/mo, which is the tier that unlocks graph memory. Enterprise is custom with on-prem, SSO, and audit.
On benchmarks, be careful to separate two eras. The 2025 paper claimed roughly a 26 percent relative improvement (LLM-as-judge) over OpenAI's memory on LOCOMO, with about 91 percent lower p95 latency and over 90 percent token savings versus stuffing full context. The 2026 research page reports newer figures: LoCoMo 92.5, LongMemEval 94.4, and BEAM scores, under roughly 7,000 tokens per retrieval. Those are different measurements from different harnesses; cite them distinctly rather than as one continuous claim.
The honest read: Mem0 is the fastest of the four to ship, with low-latency vector retrieval and strong episodic recall. Pure vector-plus-extraction is weaker on its own at deep temporal or multi-hop reasoning, which is exactly the gap the next contender targets.
Zep: A Temporal Knowledge Graph
Zep approaches memory as a graph problem. Its open-source engine, Graphiti (Apache 2.0, around 28.2k stars), is a temporally-aware knowledge graph engine, described in the Zep paper. The defining feature is a bi-temporal model: every fact carries both a valid time and a transaction time, and superseded facts are not deleted but marked, so the graph can answer questions about what was true at a given moment. Retrieval is hybrid, combining embeddings, BM25 keyword search, and graph traversal, with provenance tracked through "episodes."
Graphiti self-hosts on Neo4j, FalkorDB, Kuzu, or Amazon Neptune. The managed Zep platform adds governance: attribute-based access control, retention policies, and audit. Pricing starts free ($0, 10k credits monthly, 2 projects), then Flex at roughly $104/mo billed annually, Flex Plus at roughly $312/mo, and custom Enterprise with SOC 2 Type II and HIPAA BAA.
Zep's paper reported 94.8 percent on Deep Memory Retrieval (versus 93.4 for MemGPT) and up to an 18.5 percent accuracy gain on LongMemEval with a 90 percent latency reduction versus full context. The graph approach shines for entity-centric, temporal, and contradiction-resolving questions and multi-hop reasoning. The cost is real: you take on schema and extraction overhead, and self-hosting means running a graph database, which is more operational weight than Mem0's vector store.
Letta: The Agent Edits Its Own Memory
Letta (formerly MemGPT) is less a memory API and more a platform for stateful agents. Its premise, from the core concepts, is that all agent state persists in a database even after it is evicted from the context window. Memory comes in layers: memory blocks are labeled text pinned into context that the agent can edit and share, and archival memory is a searchable database the agent queries on demand. The distinguishing idea is self-editing memory: the agent decides, via tools, what to write, update, or pull into context. This descends directly from the MemGPT paper, "Towards LLMs as Operating Systems," which framed context management as an OS-style virtual memory problem.
Letta is Apache 2.0 (github.com/letta-ai/letta, around 23.6k stars) and self-hostable, with Letta Cloud as the hosted option. Pricing offers a free tier (bring-your-own-key across all tiers), Pro at $20/mo, and an API plan at $20/mo base plus $0.10 per active agent per month and a small tool-execution fee, which suits fleets of many long-lived agents.
The tradeoff is latency versus flexibility. Letting the agent manage its own memory through tool calls gives you maximum control and auditability (you can see every memory edit as an action), but the LLM-in-the-loop retrieval adds turns and cost that a direct vector lookup avoids. If you want an agent whose memory is a first-class, inspectable part of its reasoning, Letta is the most opinionated choice here. The idea of memory as an inspectable ledger is one this site has explored in the agent memory context ledger.
Newsletter
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.
From the archive
Nimbalyst: A Visual Workspace That Unifies Codex and Claude Code
Jul 2, 2026 • 6 min read
Linked Context: When a Skill Can Point at the Whole Web
Jul 2, 2026 • 10 min read
The Economics of Agent Fleets: Fable 5 Orchestrators, Sonnet 5 Workers
Jul 1, 2026 • 8 min read
Agents 101: How to Build and Deploy Anything with AI Agents
Jul 1, 2026 • 7 min read
Cloudflare: A Substrate, Not a Memory Product
Cloudflare belongs in this comparison with an asterisk. The Agents SDK (MIT, around 5.2k stars) does not give you a memory algorithm; it gives you a place to put state. Each agent is a Durable Object with its own identity, lifecycle, and embedded per-agent SQLite storage. State auto-saves, survives restarts and hibernation, and syncs to connected WebSocket clients, and local this.sql queries are described as effectively zero-latency because there is no network round trip. Vector memory comes from pairing it with Vectorize, and inference from Workers AI. Idle agents hibernate and cost nothing.
Pricing is Cloudflare's platform model, not a per-memory fee: a Workers Paid plan (from $5/mo, required for production SQLite Durable Objects) plus usage on requests, duration, and SQL rows read and written, per the Durable Objects pricing. There are no benchmark claims to weigh because there is no retrieval algorithm to benchmark; you build the memory logic.
The tradeoff is clear. You get a stateful substrate with excellent local-read latency, hibernation economics, and per-session isolation, but you write the extraction and retrieval yourself. And while the SDK is MIT and your data sits in plain SQLite, the runtime primitives are Cloudflare-only, which is the deepest platform coupling of the four. The Cloudflare agent memory primitive guide goes deeper on wiring it up.
The Head-to-Head
| Mem0 | Zep (Graphiti) | Letta | Cloudflare Agents | |
|---|---|---|---|---|
| Memory model | Extract + retrieve, vector-first | Temporal knowledge graph | Self-editing agent memory | State substrate you build on |
| Core license | Apache 2.0 | Apache 2.0 (Graphiti) | Apache 2.0 | MIT (SDK) |
| Self-host | Yes (vector store) | Yes (needs graph DB) | Yes | No, platform-bound runtime |
| Managed entry price | Free, then $19/mo | Free, then ~$104/mo | Free, then $20/mo | $5/mo Workers Paid + usage |
| Strength | Fast to ship, low-latency recall | Temporal, multi-hop, entity-centric | Auditable, agent-controlled | Zero-latency local state, hibernation |
| Main cost | Weaker deep temporal reasoning alone | Schema + graph ops overhead | LLM-in-loop retrieval latency | You build the memory logic |
| GitHub stars (approx) | 59.9k | 28.2k | 23.6k | 5.2k |
Star counts and prices are point-in-time snapshots; verify against the linked pages before you commit.
About Those Benchmarks
If you take one thing from this post, take this: no single memory benchmark number is comparable across vendors in 2026. Everyone runs the same tests under different configurations, and the results move accordingly.
LOCOMO is the most-cited benchmark, built on very long conversations (around 300 turns, up to 35 sessions) with question types spanning single-hop, multi-hop, temporal, and adversarial. It has documented flaws, including speaker misattribution and ambiguous questions, which is part of why the scores are contested. The clearest example: Zep originally reported around 84 percent on LOCOMO, Mem0's replication scored Zep at 58.44 percent and alleged methodology errors, and Zep rebutted with a 75.14 percent figure of its own. Both sides are interested parties. The GitHub issue trail is the primary record if you want to judge for yourself.
LongMemEval (ICLR 2025) is widely seen as more rigorous, with 500 questions across information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention, and it documents roughly a 30 percent accuracy drop over sustained interaction. Both Mem0 and Zep cite it, again under their own harnesses. Deep Memory Retrieval, from the MemGPT paper, is now considered narrow and largely saturated. The practical move is to benchmark the top two candidates on your own traffic rather than trusting any vendor's leaderboard.
Which to Pick
Pick Mem0 when you want a memory layer live this week, your workload is conversational recall and personalization, and low retrieval latency matters more than deep temporal reasoning. The free and $19 tiers make prototyping cheap, and the Apache 2.0 core is there if you outgrow the hosted plan.
Pick Zep when your agent must reason over how facts change through time, resolve contradictions, or traverse relationships between entities, think customer histories, evolving account state, or anything where "what was true when" is a real question. Accept the graph-database operational cost as the price of that capability.
Pick Letta when memory should be a first-class, inspectable part of the agent's own behavior, when you are running many long-lived agents, and when auditability of every memory write is worth the extra latency of LLM-in-the-loop retrieval. It is also the most natural home if you already think in the MemGPT model.
Pick Cloudflare Agents when you are building on Cloudflare anyway, want per-session stateful agents with near-zero local-read latency and hibernation economics, and are happy to write your own extraction and retrieval on top of Durable Objects and Vectorize. It is a substrate decision, not a memory-algorithm decision.
On the broader question of self-host versus managed: all three memory products ship Apache 2.0 cores with managed layers, so you can start hosted and move in-house for data residency or to escape per-call fees. Cloudflare is the outlier, MIT SDK but platform-bound runtime, though your data stays in portable SQLite. Where this fits your larger toolchain is covered in the agentic dev stack for 2026.
The Take
There is no single best memory provider, only the best fit for your access pattern. If your questions are "what did the user tell me," reach for Mem0. If they are "what was true, and when," reach for Zep. If they are "let the agent decide what to remember, and show me every edit," reach for Letta. If they are "I already live on Cloudflare and I will build the memory myself," reach for Durable Objects. And whatever the vendor charts say, run the final two candidates against your own conversations before you wire one in. The benchmarks are a starting point, not a verdict.
FAQ
What is an AI agent memory provider?
It is a system that persists what an agent learns across sessions and retrieves the relevant pieces before each model call, so the agent does not start from zero every time. Approaches range from extract-and-retrieve over a vector store (Mem0), to temporal knowledge graphs (Zep), to agent-managed self-editing memory (Letta), to building your own on a stateful substrate (Cloudflare). See AI agent memory patterns for the categories.
Is Mem0, Zep, or Letta open source?
All three have Apache 2.0 open-source cores. Mem0's library and Letta are directly open source and self-hostable, and Zep's memory engine Graphiti is Apache 2.0 and self-hosts on a graph database. Each also offers a managed hosted platform. Cloudflare's Agents SDK is MIT, but its runtime primitives run only on Cloudflare's platform.
Which agent memory provider is cheapest?
For getting started, all four have a free entry point. Paid tiers begin around $19/mo for Mem0, $20/mo for Letta Pro, roughly $104/mo (billed annually) for Zep's Flex tier, and $5/mo plus usage for Cloudflare's required Workers Paid plan. The cheapest at scale depends heavily on your volume of memory writes, retrievals, and active agents, so model it against your own usage.
When should I use a knowledge-graph memory like Zep instead of vector memory like Mem0?
Use a temporal knowledge graph when your agent needs to reason about how facts change over time, resolve contradictions, or traverse relationships between entities. Use vector-based extract-and-retrieve when the priority is fast recall of conversational facts and personalization. Graphs add power for multi-hop and temporal questions at the cost of more setup and operational overhead.
Are the LOCOMO benchmark scores reliable?
Treat them with caution. LOCOMO has documented issues, and vendors run it under different configurations, which has produced public disputes, most notably between Mem0 and Zep over Zep's LOCOMO score. LongMemEval is generally considered more rigorous, but it too is cited under different harnesses. The reliable approach is to benchmark your finalists on your own data.
What is the difference between Letta and MemGPT?
Letta is the platform built by the team behind MemGPT, and it carries the MemGPT context-management approach forward. The MemGPT paper introduced the idea of treating the context window like an operating system's memory, paging information in and out; Letta productizes that into stateful agents with editable memory blocks and archival memory.
Is Cloudflare Agents a memory provider?
Not in the same sense as the others. It provides a stateful substrate, per-agent Durable Objects with embedded SQLite and near-zero-latency local reads, on top of which you build your own memory logic. There is no built-in extraction or retrieval algorithm, so there are no memory benchmarks to compare. Pair it with Vectorize for semantic search if you need it.
Sources
- Mem0 documentation and how it works
- Mem0 pricing and research
- Mem0 GitHub and 2025 paper (arXiv:2504.19413)
- Zep and pricing
- Graphiti GitHub and Zep paper (arXiv:2501.13956)
- Zep rebuttal on LOCOMO methodology and benchmark issue thread
- Letta documentation and pricing
- Letta GitHub and MemGPT paper (arXiv:2310.08560)
- Cloudflare Agents docs and state API
- Cloudflare Durable Objects pricing
- LOCOMO paper (arXiv:2402.17753) and LongMemEval (arXiv:2410.10813)
Read next
AI Agent Memory Patterns
Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, context compression, and conversation summarization.
9 min readCloudflare Agent Memory: A Developer's Guide to the New Primitive
Cloudflare's Agent Memory primitive. What it stores, latency profile, how it compares to mem0, and how to wire it into your stack.
9 min readAgent Memory Benchmarks Are Not Enough
Persistent memory for coding agents is trending because every session still starts too cold. The hard part is not saving facts. It is proving recall, freshness, deletion, and rollback under real development pressure.
9 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.









