Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare

Official Sources
Mem0 Docs / Pricing	Extract-and-retrieve memory layer
Zep / Graphiti / Pricing	Temporal knowledge graph memory
Letta Docs / Pricing	Stateful agents, self-editing memory
Cloudflare Agents	Durable Objects state primitive
LOCOMO paper / LongMemEval	The benchmarks everyone cites

Last updated: July 2, 2026

Agents forget. The model that just spent twenty turns learning your codebase, your preferences, and the shape of the task wakes up the next session knowing none of it. A memory layer is the fix, and by 2026 it is a real market: you can bolt on a hosted API in an afternoon, or self-host an open-source core and own the data. The four names that come up most are Mem0, Zep, Letta, and Cloudflare's Agents primitive, and they are not four flavors of the same thing. One extracts facts and retrieves them, one builds a temporal knowledge graph, one lets the agent edit its own memory, and one is not really a memory product at all but the substrate you build memory on.

This is a fair, sourced comparison: what each actually is, how it is priced, the benchmark fights you should not take at face value, and a decision guide by workload. If you want the conceptual grounding first, AI agent memory patterns covers the categories, and why agent memory benchmarks are not enough sets up the skepticism you will need for the numbers below.

Mem0: Extract, Then Retrieve

Mem0 bills itself as a "universal memory layer for AI agents." The core-concepts docs describe a two-phase design: an extract phase that uses an LLM to pull durable facts out of a conversation, deduplicate them, and embed them, and a retrieve phase that fuses parallel scoring passes (semantic, keyword, and entity) to surface the relevant memories before the next model call. State lands in three tiers: a SQL store for facts, a vector DB for embeddings, and an entity store for relationships, with an optional graph-memory variant described in their 2025 paper. Everything is scoped by user_id, agent_id, or run_id.

It is open source under Apache 2.0 (github.com/mem0ai/mem0, roughly 59.9k stars as of this writing) and self-hostable, with a managed platform at app.mem0.ai. Hosted pricing runs from a free Hobby tier (10k memory adds, 1k retrievals monthly) through Starter at $19/mo, Growth at $79/mo, and Pro at $249/mo, which is the tier that unlocks graph memory. Enterprise is custom with on-prem, SSO, and audit.

On benchmarks, be careful to separate two eras. The 2025 paper claimed roughly a 26 percent relative improvement (LLM-as-judge) over OpenAI's memory on LOCOMO, with about 91 percent lower p95 latency and over 90 percent token savings versus stuffing full context. The 2026 research page reports newer figures: LoCoMo 92.5, LongMemEval 94.4, and BEAM scores, under roughly 7,000 tokens per retrieval. Those are different measurements from different harnesses; cite them distinctly rather than as one continuous claim.

The honest read: Mem0 is the fastest of the four to ship, with low-latency vector retrieval and strong episodic recall. Pure vector-plus-extraction is weaker on its own at deep temporal or multi-hop reasoning, which is exactly the gap the next contender targets.

Zep: A Temporal Knowledge Graph

Zep approaches memory as a graph problem. Its open-source engine, Graphiti (Apache 2.0, around 28.2k stars), is a temporally-aware knowledge graph engine, described in the Zep paper. The defining feature is a bi-temporal model: every fact carries both a valid time and a transaction time, and superseded facts are not deleted but marked, so the graph can answer questions about what was true at a given moment. Retrieval is hybrid, combining embeddings, BM25 keyword search, and graph traversal, with provenance tracked through "episodes."

Graphiti self-hosts on Neo4j, FalkorDB, Kuzu, or Amazon Neptune. The managed Zep platform adds governance: attribute-based access control, retention policies, and audit. Pricing starts free ($0, 10k credits monthly, 2 projects), then Flex at roughly $104/mo billed annually, Flex Plus at roughly $312/mo, and custom Enterprise with SOC 2 Type II and HIPAA BAA.

Zep's paper reported 94.8 percent on Deep Memory Retrieval (versus 93.4 for MemGPT) and up to an 18.5 percent accuracy gain on LongMemEval with a 90 percent latency reduction versus full context. The graph approach shines for entity-centric, temporal, and contradiction-resolving questions and multi-hop reasoning. The cost is real: you take on schema and extraction overhead, and self-hosting means running a graph database, which is more operational weight than Mem0's vector store.

Letta: The Agent Edits Its Own Memory

Letta (formerly MemGPT) is less a memory API and more a platform for stateful agents. Its premise, from the core concepts, is that all agent state persists in a database even after it is evicted from the context window. Memory comes in layers: memory blocks are labeled text pinned into context that the agent can edit and share, and archival memory is a searchable database the agent queries on demand. The distinguishing idea is self-editing memory: the agent decides, via tools, what to write, update, or pull into context. This descends directly from the MemGPT paper, "Towards LLMs as Operating Systems," which framed context management as an OS-style virtual memory problem.

Letta is Apache 2.0 (github.com/letta-ai/letta, around 23.6k stars) and self-hostable, with Letta Cloud as the hosted option. Pricing offers a free tier (bring-your-own-key across all tiers), Pro at $20/mo, and an API plan at $20/mo base plus $0.10 per active agent per month and a small tool-execution fee, which suits fleets of many long-lived agents.

The tradeoff is latency versus flexibility. Letting the agent manage its own memory through tool calls gives you maximum control and auditability (you can see every memory edit as an action), but the LLM-in-the-loop retrieval adds turns and cost that a direct vector lookup avoids. If you want an agent whose memory is a first-class, inspectable part of its reasoning, Letta is the most opinionated choice here. The idea of memory as an inspectable ledger is one this site has explored in the agent memory context ledger.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Nimbalyst: A Visual Workspace That Unifies Codex and Claude Code

Jul 2, 2026 • 6 min read

Linked Context: When a Skill Can Point at the Whole Web

Jul 2, 2026 • 10 min read

The Economics of Agent Fleets: Fable 5 Orchestrators, Sonnet 5 Workers

Jul 1, 2026 • 8 min read

Agents 101: How to Build and Deploy Anything with AI Agents

Jul 1, 2026 • 7 min read

Cloudflare: A Substrate, Not a Memory Product

Cloudflare belongs in this comparison with an asterisk. The Agents SDK (MIT, around 5.2k stars) does not give you a memory algorithm; it gives you a place to put state. Each agent is a Durable Object with its own identity, lifecycle, and embedded per-agent SQLite storage. State auto-saves, survives restarts and hibernation, and syncs to connected WebSocket clients, and local this.sql queries are described as effectively zero-latency because there is no network round trip. Vector memory comes from pairing it with Vectorize, and inference from Workers AI. Idle agents hibernate and cost nothing.

Pricing is Cloudflare's platform model, not a per-memory fee: a Workers Paid plan (from $5/mo, required for production SQLite Durable Objects) plus usage on requests, duration, and SQL rows read and written, per the Durable Objects pricing. There are no benchmark claims to weigh because there is no retrieval algorithm to benchmark; you build the memory logic.

The tradeoff is clear. You get a stateful substrate with excellent local-read latency, hibernation economics, and per-session isolation, but you write the extraction and retrieval yourself. And while the SDK is MIT and your data sits in plain SQLite, the runtime primitives are Cloudflare-only, which is the deepest platform coupling of the four. The Cloudflare agent memory primitive guide goes deeper on wiring it up.

The Head-to-Head

	Mem0	Zep (Graphiti)	Letta	Cloudflare Agents
Memory model	Extract + retrieve, vector-first	Temporal knowledge graph	Self-editing agent memory	State substrate you build on
Core license	Apache 2.0	Apache 2.0 (Graphiti)	Apache 2.0	MIT (SDK)
Self-host	Yes (vector store)	Yes (needs graph DB)	Yes	No, platform-bound runtime
Managed entry price	Free, then $19/mo	Free, then ~$104/mo	Free, then $20/mo	$5/mo Workers Paid + usage
Strength	Fast to ship, low-latency recall	Temporal, multi-hop, entity-centric	Auditable, agent-controlled	Zero-latency local state, hibernation
Main cost	Weaker deep temporal reasoning alone	Schema + graph ops overhead	LLM-in-loop retrieval latency	You build the memory logic
GitHub stars (approx)	59.9k	28.2k	23.6k	5.2k

Star counts and prices are point-in-time snapshots; verify against the linked pages before you commit.

About Those Benchmarks

If you take one thing from this post, take this: no single memory benchmark number is comparable across vendors in 2026. Everyone runs the same tests under different configurations, and the results move accordingly.

LOCOMO is the most-cited benchmark, built on very long conversations (around 300 turns, up to 35 sessions) with question types spanning single-hop, multi-hop, temporal, and adversarial. It has documented flaws, including speaker misattribution and ambiguous questions, which is part of why the scores are contested. The clearest example: Zep originally reported around 84 percent on LOCOMO, Mem0's replication scored Zep at 58.44 percent and alleged methodology errors, and Zep rebutted with a 75.14 percent figure of its own. Both sides are interested parties. The GitHub issue trail is the primary record if you want to judge for yourself.

LongMemEval (ICLR 2025) is widely seen as more rigorous, with 500 questions across information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention, and it documents roughly a 30 percent accuracy drop over sustained interaction. Both Mem0 and Zep cite it, again under their own harnesses. Deep Memory Retrieval, from the MemGPT paper, is now considered narrow and largely saturated. The practical move is to benchmark the top two candidates on your own traffic rather than trusting any vendor's leaderboard.

Which to Pick

Pick Mem0 when you want a memory layer live this week, your workload is conversational recall and personalization, and low retrieval latency matters more than deep temporal reasoning. The free and $19 tiers make prototyping cheap, and the Apache 2.0 core is there if you outgrow the hosted plan.

Pick Zep when your agent must reason over how facts change through time, resolve contradictions, or traverse relationships between entities, think customer histories, evolving account state, or anything where "what was true when" is a real question. Accept the graph-database operational cost as the price of that capability.

Pick Letta when memory should be a first-class, inspectable part of the agent's own behavior, when you are running many long-lived agents, and when auditability of every memory write is worth the extra latency of LLM-in-the-loop retrieval. It is also the most natural home if you already think in the MemGPT model.

Pick Cloudflare Agents when you are building on Cloudflare anyway, want per-session stateful agents with near-zero local-read latency and hibernation economics, and are happy to write your own extraction and retrieval on top of Durable Objects and Vectorize. It is a substrate decision, not a memory-algorithm decision.

On the broader question of self-host versus managed: all three memory products ship Apache 2.0 cores with managed layers, so you can start hosted and move in-house for data residency or to escape per-call fees. Cloudflare is the outlier, MIT SDK but platform-bound runtime, though your data stays in portable SQLite. Where this fits your larger toolchain is covered in the agentic dev stack for 2026.

The Take

There is no single best memory provider, only the best fit for your access pattern. If your questions are "what did the user tell me," reach for Mem0. If they are "what was true, and when," reach for Zep. If they are "let the agent decide what to remember, and show me every edit," reach for Letta. If they are "I already live on Cloudflare and I will build the memory myself," reach for Durable Objects. And whatever the vendor charts say, run the final two candidates against your own conversations before you wire one in. The benchmarks are a starting point, not a verdict.

FAQ

What is an AI agent memory provider?

It is a system that persists what an agent learns across sessions and retrieves the relevant pieces before each model call, so the agent does not start from zero every time. Approaches range from extract-and-retrieve over a vector store (Mem0), to temporal knowledge graphs (Zep), to agent-managed self-editing memory (Letta), to building your own on a stateful substrate (Cloudflare). See AI agent memory patterns for the categories.

Is Mem0, Zep, or Letta open source?

All three have Apache 2.0 open-source cores. Mem0's library and Letta are directly open source and self-hostable, and Zep's memory engine Graphiti is Apache 2.0 and self-hosts on a graph database. Each also offers a managed hosted platform. Cloudflare's Agents SDK is MIT, but its runtime primitives run only on Cloudflare's platform.

Which agent memory provider is cheapest?

For getting started, all four have a free entry point. Paid tiers begin around $19/mo for Mem0, $20/mo for Letta Pro, roughly $104/mo (billed annually) for Zep's Flex tier, and $5/mo plus usage for Cloudflare's required Workers Paid plan. The cheapest at scale depends heavily on your volume of memory writes, retrievals, and active agents, so model it against your own usage.

When should I use a knowledge-graph memory like Zep instead of vector memory like Mem0?

Use a temporal knowledge graph when your agent needs to reason about how facts change over time, resolve contradictions, or traverse relationships between entities. Use vector-based extract-and-retrieve when the priority is fast recall of conversational facts and personalization. Graphs add power for multi-hop and temporal questions at the cost of more setup and operational overhead.

Are the LOCOMO benchmark scores reliable?

Treat them with caution. LOCOMO has documented issues, and vendors run it under different configurations, which has produced public disputes, most notably between Mem0 and Zep over Zep's LOCOMO score. LongMemEval is generally considered more rigorous, but it too is cited under different harnesses. The reliable approach is to benchmark your finalists on your own data.

What is the difference between Letta and MemGPT?

Letta is the platform built by the team behind MemGPT, and it carries the MemGPT context-management approach forward. The MemGPT paper introduced the idea of treating the context window like an operating system's memory, paging information in and out; Letta productizes that into stateful agents with editable memory blocks and archival memory.

Is Cloudflare Agents a memory provider?

Not in the same sense as the others. It provides a stateful substrate, per-agent Durable Objects with embedded SQLite and near-zero-latency local reads, on top of which you build your own memory logic. There is no built-in extraction or retrieval algorithm, so there are no memory benchmarks to compare. Pair it with Vectorize for semantic search if you need it.

Sources

Mem0 documentation and how it works
Mem0 pricing and research
Mem0 GitHub and 2025 paper (arXiv:2504.19413)
Zep and pricing
Graphiti GitHub and Zep paper (arXiv:2501.13956)
Zep rebuttal on LOCOMO methodology and benchmark issue thread
Letta documentation and pricing
Letta GitHub and MemGPT paper (arXiv:2310.08560)
Cloudflare Agents docs and state API
Cloudflare Durable Objects pricing
LOCOMO paper (arXiv:2402.17753) and LongMemEval (arXiv:2410.10813)

Official Sources
Mem0 Docs / Pricing	Extract-and-retrieve memory layer
Zep / Graphiti / Pricing	Temporal knowledge graph memory
Letta Docs / Pricing	Stateful agents, self-editing memory
Cloudflare Agents	Durable Objects state primitive
LOCOMO paper / LongMemEval	The benchmarks everyone cites