
Agent Infrastructure
3 partsTL;DR
Cloudflare's Agent Memory primitive. What it stores, latency profile, how it compares to mem0, and how to wire it into your stack.
If you have shipped a production agent in the last twelve months, you already know that memory is where the wheels come off. The model is not the bottleneck. The framework is not the bottleneck. The thing that keeps your agent from feeling like a real product is that it forgets everything between sessions and most things within a session.
There are three workable answers to this in the open. You can run mem0 and pay for the managed service. You can roll your own with a vector store, a summarization pass, and a lot of glue code. Or you can wait for the platform you already deploy on to ship a memory primitive.
Cloudflare just shipped that primitive. The Agent Memory announcement is the most opinionated take on agent state we have seen from a hyperscaler. It is worth comparing to mem0 directly because the two are aimed at exactly the same problem and they make very different tradeoffs.
Cloudflare Agent Memory is a managed key-value-plus-vector store, scoped to an agent and an entity, accessible from Workers, Durable Objects, and the Agents SDK. Three things make it different from "just use D1 plus Vectorize."
It is entity-scoped. Memory is namespaced by agent id and entity id, typically a user, a session, or a tenant. The API does the partitioning for you, and the latency profile is tuned for "fetch this user's memory at the start of every turn."
It is hybrid storage. Each memory item carries a key, a value, optional structured metadata, and an embedding. You can query by exact key, by metadata filter, or by semantic similarity. This collapses the typical "I have a key-value store and a vector store and they are out of sync" problem into a single API.
It is pinned to the Workers runtime. Reads from a Worker in the same region as the memory store target single-digit milliseconds. Reads from the colocated Durable Object are even faster. This matters because agent memory is on the hot path of every turn. If the read costs you fifty milliseconds, you feel it.
The primitive ships with built-in summarization. You can write raw conversation turns and the platform will roll them up into stable summaries on a schedule. This is the piece that makes the long-tail memory problem tractable without you writing a summarization worker yourself.
The Agents SDK exposes memory through a single client. Here is a minimal turn handler that reads relevant memory, runs the model, and writes new memory back.
import { Agent, AgentMemory } from "@cloudflare/agents";
import { generateText } from "ai";
import { workersAI } from "@ai-sdk/workers-ai";
export class SupportAgent extends Agent<Env> {
async onMessage(message: string, ctx: { userId: string }) {
const memory = new AgentMemory(this.env.MEMORY, {
agent: "support",
entity: ctx.userId,
});
const relevant = await memory.search({
query: message,
limit: 6,
filters: { kind: "preference" },
});
const recent = await memory.list({
kind: "turn",
limit: 10,
orderBy: "recent",
});
const { text } = await generateText({
model: workersAI("@cf/meta/llama-3.3-70b"),
system: this.buildSystem(relevant, recent),
prompt: message,
});
await memory.write({
kind: "turn",
key: `turn-${Date.now()}`,
value: { user: message, assistant: text },
embed: message,
});
return text;
}
private buildSystem(relevant: any[], recent: any[]) {
return `Known user preferences:\n${relevant
.map((r) => `- ${r.value.fact}`)
.join("\n")}\n\nRecent conversation:\n${recent
.reverse()
.map((t) => `User: ${t.value.user}\nAssistant: ${t.value.assistant}`)
.join("\n")}`;
}
}
A few things to notice.
The kind field is not magic. It is a metadata column you control. Use it to separate raw turns from extracted preferences from facts the agent has been told. The platform does not enforce a schema. That is your job. The filter syntax makes it cheap to keep them apart.
The embed field at write time is what enables semantic search later. If you skip it, the item is stored but is only retrievable by key or metadata. For most agent workloads you want semantic on at least a subset of writes.
The summarization layer, when enabled, runs as a Cloudflare-side job that walks kind: "turn" items and writes back kind: "summary" items on a configurable cadence. Your hot read path then queries summaries instead of raw turns, which keeps the context window manageable without you running a separate worker.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
The free tier is generous but vector queries are billed separately from key reads. If your agent does a hybrid read on every turn, the per-month math is reasonable for a small SaaS but can surprise you at scale. Profile early.
Cross-region replication is eventually consistent. If your user moves between regions mid-conversation, you can briefly see stale memory. The platform converges fast, typically under a second, but a chat UI can render a turn before the write propagates. Plan for this in the UX or pin sessions to a region.
The summarization layer is good but not magic. It will compress aggressively for long histories. If your agent depends on exact quotes from twenty turns ago, do not rely on summaries. Keep the raw turn store and pull from it explicitly when needed.
Schema migrations are your problem. The platform does not version your memory shape. If you change what you store under kind: "preference", write a migration that reads old shape and rewrites in new shape. There is no "memory ALTER TABLE."
mem0 is the incumbent here, and the comparison is interesting because the two products target the same problem from different sides.
mem0 is a managed service with a strong ergonomic story. The SDK feels designed for people who do not want to think about storage. It runs on its own infrastructure, exposes a clean REST API, and has the most mature schema for "extract preferences from raw turns automatically." If you are running outside Cloudflare and want a memory layer that feels like a finished product, mem0 is still the answer.
Cloudflare Agent Memory is a primitive. It assumes you are already in the Workers runtime and gives you the lowest-latency, most-integrated path to durable agent state on that platform. The semantic retrieval, the summarization, the entity scoping are all present, but the API is closer to "managed database" than "finished memory product." You will write more code, and you will get more control.
The right way to choose is by deploy target. If your agent runs on Workers, Cloudflare Agent Memory is the obvious primitive. The latency advantage on the hot path is large enough to matter. If your agent runs on Vercel, AWS, or Fly, mem0's portability is the clearer win. Mixing the two is possible but probably overkill for an indie team.
Cost-wise, both are reasonable for small to medium scale. At very high write volumes, the Cloudflare model wins on the per-write cost. At very high read volumes with complex semantic queries, the math gets closer.
We have been integrating Agent Memory into AgentFS, our virtual filesystem layer for AI agents that gives them durable, sandboxed working storage across runs. AgentFS started as a thin wrapper over R2 and Durable Objects to give agents a persistent /workspace directory between sessions. The memory primitive lets us layer a richer abstraction on top: the agent's beliefs about the workspace, not just the raw bytes in it.
The pattern is straightforward. AgentFS stores files in R2. Cloudflare Agent Memory stores the agent's notes about those files - what they are, what changed last session, what the user wanted - keyed by file path with semantic search over the notes. The agent walks into a new session, queries memory for "what was I working on in this directory," and gets back the relevant notes without scanning the whole filesystem.
For observability, Traces reads from the same memory store to render a session timeline. Every memory read and write is a trace event, and the UI shows you which memories influenced the model's output on each turn. This turns out to be the killer debug tool for agent memory. Without it, you are guessing about why the agent forgot something or why it remembered something it should not have. With it, you can scroll the timeline and see exactly which memory keys were in context.
Wiring memory into a real product is not just about storage. It is about understanding what the agent saw and when. That is a tooling problem as much as an infrastructure problem.
The interesting open questions.
Cross-agent memory. Right now memory is scoped to one agent id. Real product use cases - a customer support agent that needs to know what the sales agent told the user yesterday - require cross-agent memory or a shared memory layer. Cloudflare has hinted at this but not shipped it. mem0 already supports it.
Memory pruning policy. The platform stores everything you write, indefinitely. For GDPR and product hygiene, you need a deletion API and ideally a TTL primitive. Both exist, but the ergonomics are not where they should be yet.
The "memory as a graph" question. The current API is flat: keys, values, embeddings, metadata. Some of the most interesting agent memory research is about graph structures over those memories. Whether platforms ship graph-native memory or push it to userland is the next architectural decision worth tracking.
We are running a deeper hands-on walkthrough on YouTube, building a customer support agent that uses Cloudflare Agent Memory end to end, with debug traces and a side-by-side comparison against mem0. If you are picking your memory layer this quarter, that comparison is the thing to watch. The platforms have converged faster than expected, and the choice is now mostly about deploy target rather than capability.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolOpenAI's cloud coding agent. Runs in a sandboxed container, reads your repo, executes tasks, and submits PRs. Uses GPT-5...
View ToolCodeium's AI-native IDE. Cascade agent mode handles multi-file edits autonomously. Free tier with generous limits. Stron...
View ToolCognition Labs' autonomous software engineer. Handles full tasks end-to-end - reads docs, writes code, runs tests, and...
View ToolConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsDefine custom subagent types within your project's memory layer.
Claude Code
Auto Agent: Self-Improving AI Harnesses Inspired by Karpathy’s Auto-Research Loop The video explains self-improving agents and highlights Kevin Guo’s Auto Agent project as an extension of Andrej Karp...

Check out Replit: https://replit.com/refer/DevelopersDiges The video demos Replit’s Agent 4, explaining how Replit evolved from a cloud IDE into a platform where users can build, deploy, and scale ap...

Durable execution lands on Vercel. What it means for agents, long-running flows, and indie dev stacks - with code, gotch...

Agent runs are opaque. TraceTrail turns a Claude Code JSONL into a public share link with a stepped timeline of messages...

The second half of our agent tooling release: distribution, validation, and ergonomics layered on top of the first six....

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.