AI Agent Memory Patterns

Official Sources

Resource	Link
Claude Code Memory & CLAUDE.md	docs.anthropic.com/claude-code/memory
Hugging Face Transformers.js	huggingface.co/docs/transformers.js
RAG Original Paper (Lewis et al.)	arxiv.org/abs/2005.11401
Pinecone Documentation	docs.pinecone.io
Weaviate Documentation	weaviate.io/developers/weaviate
Chroma Documentation	docs.trychroma.com

Every AI agent starts with amnesia. The context window is its entire working memory, and it resets to zero between sessions. Building useful agents means solving this problem.

Here are the memory patterns that work in production.

Pattern 1: File-Based Persistence (CLAUDE.md)

The simplest memory system. Write what matters to a file. Read it at the start of every session.

For the larger agent workflow map, read AI Agents Explained: A TypeScript Developer's Guide and How to Build AI Agents in TypeScript; they give the architecture and implementation context this piece assumes.

// Write memory
async function remember(key: string, value: string) {
  const memory = JSON.parse(await fs.readFile("memory.json", "utf-8").catch(() => "{}"));
  memory[key] = { value, timestamp: Date.now() };
  await fs.writeFile("memory.json", JSON.stringify(memory, null, 2));
}

// Read memory
async function recall(): Promise<Record<string, string>> {
  const memory = JSON.parse(await fs.readFile("memory.json", "utf-8").catch(() => "{}"));
  return Object.fromEntries(Object.entries(memory).map(([k, v]: [string, any]) => [k, v.value]));
}

Claude Code uses this pattern with CLAUDE.md. Project rules, architecture decisions, coding standards - all persisted as plain text that the model reads at session start.

When to use: Project configuration, coding standards, persistent rules. Anything that does not change often but must be remembered across sessions.

Limitation: File size is limited by the context window. A 50KB CLAUDE.md consumes tokens that could be used for reasoning. Run yours through our token estimator if you are not sure how much headroom you have left.

Pattern 2: RAG (Retrieval-Augmented Generation)

Instead of loading everything into context, index your knowledge and retrieve only what is relevant to the current query.

import { pipeline } from "@huggingface/transformers";

const embedder = await pipeline("feature-extraction", "mixedbread-ai/mxbai-embed-xsmall-v1");

// Index documents
async function index(docs: { id: string; text: string }[]) {
  const vectors = await Promise.all(
    docs.map(async (doc) => {
      const embedding = await embedder(doc.text, { pooling: "mean", normalize: true });
      return { id: doc.id, text: doc.text, vector: embedding.tolist()[0] };
    })
  );
  return vectors;
}

// Retrieve relevant docs for a query
async function retrieve(query: string, index: any[], topK = 3) {
  const queryVec = (await embedder(query, { pooling: "mean", normalize: true })).tolist()[0];

  return index
    .map((doc) => ({
      ...doc,
      score: cosineSimilarity(queryVec, doc.vector),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

The agent gets relevant context without the full knowledge base consuming the window.

When to use: Large knowledge bases (documentation, codebases, conversation history). When the context window cannot hold everything.

Limitation: Retrieval quality depends on embedding model and chunking strategy. Bad retrieval means bad context.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Anthropic vs OpenAI: Developer Experience Compared

Apr 3, 2026 • 8 min read

Building a SaaS with Claude Code: End-to-End Guide

Apr 3, 2026 • 10 min read

Case Study: Building Developers Digest with Claude Code

Apr 3, 2026 • 8 min read

Convex vs Supabase for AI Apps

Apr 3, 2026 • 8 min read

Pattern 3: Conversation Summarization

Long conversations overflow the context window. Instead of dropping old messages, summarize them.

async function summarizeHistory(messages: Message[]): Promise<string> {
  if (messages.length < 20) return ""; // No need to summarize short conversations

  const oldMessages = messages.slice(0, -10); // Keep last 10 intact
  const { text } = await generateText({
    model: anthropic("claude-haiku-4-5"),
    prompt: `Summarize this conversation history in 3-5 bullet points. Focus on decisions made, tasks completed, and current state:\n\n${oldMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}`,
  });

  return text;
}

// Use in agent loop
const summary = await summarizeHistory(messages);
const contextMessages = [
  { role: "system", content: `Previous conversation summary:\n${summary}` },
  ...messages.slice(-10), // Recent messages in full
];

The agent retains awareness of the full conversation without the token cost.

When to use: Long-running agent sessions. Customer support agents. Multi-turn development sessions.

Pattern 4: Structured State

Track agent state as a typed object, not free text. Serialize between sessions.

interface AgentState {
  task: string;
  status: "planning" | "executing" | "reviewing" | "done";
  filesModified: string[];
  testsRun: { file: string; passed: boolean }[];
  decisions: { what: string; why: string; timestamp: number }[];
  blockers: string[];
}

const initialState: AgentState = {
  task: "",
  status: "planning",
  filesModified: [],
  testsRun: [],
  decisions: [],
  blockers: [],
};

// Persist between steps
async function saveState(state: AgentState) {
  await fs.writeFile(".agent-state.json", JSON.stringify(state, null, 2));
}

async function loadState(): Promise<AgentState> {
  return JSON.parse(
    await fs.readFile(".agent-state.json", "utf-8").catch(() => JSON.stringify(initialState))
  );
}

The agent can resume exactly where it left off. Every decision is logged with reasoning.

When to use: Multi-step workflows that may be interrupted. CI/CD pipelines. Long-running automation.

Pattern 5: Tiered Memory

Different types of information need different retention strategies.

Working Memory (context window)
  - Current task, recent messages, active file contents
  - Lifetime: current session only

Short-Term Memory (session state file)
  - Files modified, tests run, decisions made
  - Lifetime: current task

Long-Term Memory (CLAUDE.md / RAG index)
  - Project rules, architecture, coding standards
  - Lifetime: permanent, updated occasionally

Episodic Memory (conversation logs)
  - Past conversations summarized
  - Lifetime: retained as summaries, raw logs archived

Each tier has different storage, retrieval, and eviction strategies.

Which Pattern to Use

Scenario	Pattern
Project configuration	File-based (CLAUDE.md)
Large documentation	RAG
Long conversations	Summarization
Multi-step workflows	Structured state
Production agents	Tiered (all of the above)

Most production agents use a combination. CLAUDE.md for rules + RAG for knowledge + summarization for history + structured state for workflow tracking. When you would rather buy the memory layer than build it, compare the options in best AI agent memory providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare.

Frequently Asked Questions

Does Claude Code have built-in memory?

Yes. CLAUDE.md files at project, user, and global levels provide persistent memory. Claude Code also has auto-memory that saves important context automatically. But these are file-based - not RAG or semantic search.

How much context should I reserve for memory vs reasoning?

Keep memory under 30% of the context window. A 200K token window should use at most 60K for memory/context, leaving 140K for reasoning and tool outputs.

Can I use a vector database for agent memory?

Yes. Pinecone, Weaviate, Chroma, and pgvector all work. For browser-based agents, Transformers.js can compute embeddings client-side. The key is matching the retrieval strategy to your query patterns.

What is the best chunking strategy for RAG?

For code: chunk by function/class. For documentation: chunk by section (h2 headings). For conversations: chunk by topic shift. Overlapping chunks (50-100 token overlap) improve retrieval accuracy at the boundaries.

Agent Eval Bench Plus - Evaluation harness for AI coding agents. Plus tier adds private benchmarks, CI hooks, and historical comparisons.
Skill Builder - Build, test, and iterate agent skills from the terminal. Create Claude Code skills with interview or one-liner.

Subscribe to DevDigest on YouTube for hands-on walkthroughs

Official Sources

Resource	Link
Claude Code Memory & CLAUDE.md	docs.anthropic.com/claude-code/memory
Hugging Face Transformers.js	huggingface.co/docs/transformers.js
RAG Original Paper (Lewis et al.)	arxiv.org/abs/2005.11401
Pinecone Documentation	docs.pinecone.io
Weaviate Documentation	weaviate.io/developers/weaviate
Chroma Documentation	docs.trychroma.com

Every AI agent starts with amnesia. The context window is its entire working memory, and it resets to zero between sessions. Building useful agents means solving this problem.

Here are the memory patterns that work in production.

Pattern 1: File-Based Persistence (CLAUDE.md)

The simplest memory system. Write what matters to a file. Read it at the start of every session.

// Write memory
async function remember(key: string, value: string) {
  const memory = JSON.parse(await fs.readFile("memory.json", "utf-8").catch(() => "{}"));
  memory[key] = { value, timestamp: Date.now() };
  await fs.writeFile("memory.json", JSON.stringify(memory, null, 2));
}

// Read memory
async function recall(): Promise<Record<string, string>> {
  const memory = JSON.parse(await fs.readFile("memory.json", "utf-8").catch(() => "{}"));
  return Object.fromEntries(Object.entries(memory).map(([k, v]: [string, any]) => [k, v.value]));
}

Claude Code uses this pattern with CLAUDE.md. Project rules, architecture decisions, coding standards - all persisted as plain text that the model reads at session start.

When to use: Project configuration, coding standards, persistent rules. Anything that does not change often but must be remembered across sessions.

Pattern 2: RAG (Retrieval-Augmented Generation)

Instead of loading everything into context, index your knowledge and retrieve only what is relevant to the current query.

import { pipeline } from "@huggingface/transformers";

const embedder = await pipeline("feature-extraction", "mixedbread-ai/mxbai-embed-xsmall-v1");

// Index documents
async function index(docs: { id: string; text: string }[]) {
  const vectors = await Promise.all(
    docs.map(async (doc) => {
      const embedding = await embedder(doc.text, { pooling: "mean", normalize: true });
      return { id: doc.id, text: doc.text, vector: embedding.tolist()[0] };
    })
  );
  return vectors;
}

// Retrieve relevant docs for a query
async function retrieve(query: string, index: any[], topK = 3) {
  const queryVec = (await embedder(query, { pooling: "mean", normalize: true })).tolist()[0];

  return index
    .map((doc) => ({
      ...doc,
      score: cosineSimilarity(queryVec, doc.vector),
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

The agent gets relevant context without the full knowledge base consuming the window.

When to use: Large knowledge bases (documentation, codebases, conversation history). When the context window cannot hold everything.

Limitation: Retrieval quality depends on embedding model and chunking strategy. Bad retrieval means bad context.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Anthropic vs OpenAI: Developer Experience Compared

Apr 3, 2026 • 8 min read

Building a SaaS with Claude Code: End-to-End Guide

Apr 3, 2026 • 10 min read

Case Study: Building Developers Digest with Claude Code

Apr 3, 2026 • 8 min read

Convex vs Supabase for AI Apps

Apr 3, 2026 • 8 min read

Pattern 3: Conversation Summarization

Long conversations overflow the context window. Instead of dropping old messages, summarize them.

async function summarizeHistory(messages: Message[]): Promise<string> {
  if (messages.length < 20) return ""; // No need to summarize short conversations

  const oldMessages = messages.slice(0, -10); // Keep last 10 intact
  const { text } = await generateText({
    model: anthropic("claude-haiku-4-5"),
    prompt: `Summarize this conversation history in 3-5 bullet points. Focus on decisions made, tasks completed, and current state:\n\n${oldMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}`,
  });

  return text;
}

// Use in agent loop
const summary = await summarizeHistory(messages);
const contextMessages = [
  { role: "system", content: `Previous conversation summary:\n${summary}` },
  ...messages.slice(-10), // Recent messages in full
];

The agent retains awareness of the full conversation without the token cost.

When to use: Long-running agent sessions. Customer support agents. Multi-turn development sessions.

Pattern 4: Structured State

Track agent state as a typed object, not free text. Serialize between sessions.

interface AgentState {
  task: string;
  status: "planning" | "executing" | "reviewing" | "done";
  filesModified: string[];
  testsRun: { file: string; passed: boolean }[];
  decisions: { what: string; why: string; timestamp: number }[];
  blockers: string[];
}

const initialState: AgentState = {
  task: "",
  status: "planning",
  filesModified: [],
  testsRun: [],
  decisions: [],
  blockers: [],
};

// Persist between steps
async function saveState(state: AgentState) {
  await fs.writeFile(".agent-state.json", JSON.stringify(state, null, 2));
}

async function loadState(): Promise<AgentState> {
  return JSON.parse(
    await fs.readFile(".agent-state.json", "utf-8").catch(() => JSON.stringify(initialState))
  );
}

The agent can resume exactly where it left off. Every decision is logged with reasoning.

When to use: Multi-step workflows that may be interrupted. CI/CD pipelines. Long-running automation.

Pattern 5: Tiered Memory

Different types of information need different retention strategies.

Working Memory (context window)
  - Current task, recent messages, active file contents
  - Lifetime: current session only

Short-Term Memory (session state file)
  - Files modified, tests run, decisions made
  - Lifetime: current task

Long-Term Memory (CLAUDE.md / RAG index)
  - Project rules, architecture, coding standards
  - Lifetime: permanent, updated occasionally

Episodic Memory (conversation logs)
  - Past conversations summarized
  - Lifetime: retained as summaries, raw logs archived

Each tier has different storage, retrieval, and eviction strategies.

Which Pattern to Use

Scenario	Pattern
Project configuration	File-based (CLAUDE.md)
Large documentation	RAG
Long conversations	Summarization
Multi-step workflows	Structured state
Production agents	Tiered (all of the above)

Frequently Asked Questions

Does Claude Code have built-in memory?

How much context should I reserve for memory vs reasoning?

Keep memory under 30% of the context window. A 200K token window should use at most 60K for memory/context, leaving 140K for reasoning and tool outputs.

Can I use a vector database for agent memory?

What is the best chunking strategy for RAG?

Agent Eval Bench Plus - Evaluation harness for AI coding agents. Plus tier adds private benchmarks, CI hooks, and historical comparisons.
Skill Builder - Build, test, and iterate agent skills from the terminal. Create Claude Code skills with interview or one-liner.

Subscribe to DevDigest on YouTube for hands-on walkthroughs

Official Sources

Pattern 1: File-Based Persistence (CLAUDE.md)

Pattern 2: RAG (Retrieval-Augmented Generation)

Anthropic vs OpenAI: Developer Experience Compared

Building a SaaS with Claude Code: End-to-End Guide

Case Study: Building Developers Digest with Claude Code

Convex vs Supabase for AI Apps

Pattern 3: Conversation Summarization

Pattern 4: Structured State

Pattern 5: Tiered Memory

Which Pattern to Use

Frequently Asked Questions

Does Claude Code have built-in memory?

How much context should I reserve for memory vs reasoning?

Can I use a vector database for agent memory?

What is the best chunking strategy for RAG?

Related apps

Related

AI Agents Explained: A TypeScript Developer's Guide

Multi-Agent Systems: How to Orchestrate Multiple AI Agents in TypeScript

How to Debug AI Agent Workflows

Try These Tools

Related Tools

Mastra

Claude Code

Vercel AI SDK

CopilotKit

Apps from Developers Digest

Overnight Agents

Related Guides

AGENTS.md - Claude Code

Subagent Frontmatter - Claude Code

Subagent Persistent Memory - Claude Code

Related Videos

Agents 101: How to Build and Deploy Anything with AI Agents

Claude Code: NEW Remote Control, Auto Memory, Plugins & More

TRAE: Custom AI Agents That Actually Understand Your Codebase

Related Posts

AI Agents Explained: A TypeScript Developer's Guide

Multi-Agent Systems: How to Orchestrate Multiple AI Agents in TypeScript

How to Debug AI Agent Workflows

How to Build AI Agents in TypeScript

What Is Claude Code? The Complete Guide for 2026

Self-Improving AI Agents: Building Systems That Learn From Their Mistakes

Build with the member tools

Get Smarter About AI Dev

Official Sources

Pattern 1: File-Based Persistence (CLAUDE.md)

Pattern 2: RAG (Retrieval-Augmented Generation)

Anthropic vs OpenAI: Developer Experience Compared

Building a SaaS with Claude Code: End-to-End Guide

Case Study: Building Developers Digest with Claude Code

Convex vs Supabase for AI Apps

Pattern 3: Conversation Summarization

Pattern 4: Structured State

Pattern 5: Tiered Memory

Which Pattern to Use

Frequently Asked Questions

Does Claude Code have built-in memory?

How much context should I reserve for memory vs reasoning?

Can I use a vector database for agent memory?

What is the best chunking strategy for RAG?

Related apps

Related

AI Agents Explained: A TypeScript Developer's Guide

Multi-Agent Systems: How to Orchestrate Multiple AI Agents in TypeScript

How to Debug AI Agent Workflows

Try These Tools

Related Tools

Mastra

Claude Code

Vercel AI SDK

CopilotKit

Apps from Developers Digest

Overnight Agents

Related Guides

AGENTS.md - Claude Code

Subagent Frontmatter - Claude Code

Subagent Persistent Memory - Claude Code

Related Videos