
AI Agents Deep Dive
7 partsTL;DR
Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, context compression, and conversation summarization.
Every AI agent starts with amnesia. The context window is its entire working memory, and it resets to zero between sessions. Building useful agents means solving this problem.
Here are the memory patterns that work in production.
The simplest memory system. Write what matters to a file. Read it at the start of every session.
// Write memory
async function remember(key: string, value: string) {
const memory = JSON.parse(await fs.readFile("memory.json", "utf-8").catch(() => "{}"));
memory[key] = { value, timestamp: Date.now() };
await fs.writeFile("memory.json", JSON.stringify(memory, null, 2));
}
// Read memory
async function recall(): Promise<Record<string, string>> {
const memory = JSON.parse(await fs.readFile("memory.json", "utf-8").catch(() => "{}"));
return Object.fromEntries(Object.entries(memory).map(([k, v]: [string, any]) => [k, v.value]));
}
Claude Code uses this pattern with CLAUDE.md. Project rules, architecture decisions, coding standards - all persisted as plain text that the model reads at session start.
When to use: Project configuration, coding standards, persistent rules. Anything that does not change often but must be remembered across sessions.
Limitation: File size is limited by the context window. A 50KB CLAUDE.md consumes tokens that could be used for reasoning.
Instead of loading everything into context, index your knowledge and retrieve only what is relevant to the current query.
import { pipeline } from "@huggingface/transformers";
const embedder = await pipeline("feature-extraction", "mixedbread-ai/mxbai-embed-xsmall-v1");
// Index documents
async function index(docs: { id: string; text: string }[]) {
const vectors = await Promise.all(
docs.map(async (doc) => {
const embedding = await embedder(doc.text, { pooling: "mean", normalize: true });
return { id: doc.id, text: doc.text, vector: embedding.tolist()[0] };
})
);
return vectors;
}
// Retrieve relevant docs for a query
async function retrieve(query: string, index: any[], topK = 3) {
const queryVec = (await embedder(query, { pooling: "mean", normalize: true })).tolist()[0];
return index
.map((doc) => ({
...doc,
score: cosineSimilarity(queryVec, doc.vector),
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
}
The agent gets relevant context without the full knowledge base consuming the window.
When to use: Large knowledge bases (documentation, codebases, conversation history). When the context window cannot hold everything.
Limitation: Retrieval quality depends on embedding model and chunking strategy. Bad retrieval means bad context.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
Long conversations overflow the context window. Instead of dropping old messages, summarize them.
async function summarizeHistory(messages: Message[]): Promise<string> {
if (messages.length < 20) return ""; // No need to summarize short conversations
const oldMessages = messages.slice(0, -10); // Keep last 10 intact
const { text } = await generateText({
model: anthropic("claude-haiku-4-5"),
prompt: `Summarize this conversation history in 3-5 bullet points. Focus on decisions made, tasks completed, and current state:\n\n${oldMessages.map((m) => `${m.role}: ${m.content}`).join("\n")}`,
});
return text;
}
// Use in agent loop
const summary = await summarizeHistory(messages);
const contextMessages = [
{ role: "system", content: `Previous conversation summary:\n${summary}` },
...messages.slice(-10), // Recent messages in full
];
The agent retains awareness of the full conversation without the token cost.
When to use: Long-running agent sessions. Customer support agents. Multi-turn development sessions.
Track agent state as a typed object, not free text. Serialize between sessions.
interface AgentState {
task: string;
status: "planning" | "executing" | "reviewing" | "done";
filesModified: string[];
testsRun: { file: string; passed: boolean }[];
decisions: { what: string; why: string; timestamp: number }[];
blockers: string[];
}
const initialState: AgentState = {
task: "",
status: "planning",
filesModified: [],
testsRun: [],
decisions: [],
blockers: [],
};
// Persist between steps
async function saveState(state: AgentState) {
await fs.writeFile(".agent-state.json", JSON.stringify(state, null, 2));
}
async function loadState(): Promise<AgentState> {
return JSON.parse(
await fs.readFile(".agent-state.json", "utf-8").catch(() => JSON.stringify(initialState))
);
}
The agent can resume exactly where it left off. Every decision is logged with reasoning.
When to use: Multi-step workflows that may be interrupted. CI/CD pipelines. Long-running automation.
Different types of information need different retention strategies.
Working Memory (context window)
- Current task, recent messages, active file contents
- Lifetime: current session only
Short-Term Memory (session state file)
- Files modified, tests run, decisions made
- Lifetime: current task
Long-Term Memory (CLAUDE.md / RAG index)
- Project rules, architecture, coding standards
- Lifetime: permanent, updated occasionally
Episodic Memory (conversation logs)
- Past conversations summarized
- Lifetime: retained as summaries, raw logs archived
Each tier has different storage, retrieval, and eviction strategies.
| Scenario | Pattern |
|---|---|
| Project configuration | File-based (CLAUDE.md) |
| Large documentation | RAG |
| Long conversations | Summarization |
| Multi-step workflows | Structured state |
| Production agents | Tiered (all of the above) |
Most production agents use a combination. CLAUDE.md for rules + RAG for knowledge + summarization for history + structured state for workflow tracking.
Yes. CLAUDE.md files at project, user, and global levels provide persistent memory. Claude Code also has auto-memory that saves important context automatically. But these are file-based - not RAG or semantic search.
Keep memory under 30% of the context window. A 200K token window should use at most 60K for memory/context, leaving 140K for reasoning and tool outputs.
Yes. Pinecone, Weaviate, Chroma, and pgvector all work. For browser-based agents, Transformers.js can compute embeddings client-side. The key is matching the retrieval strategy to your query patterns.
For code: chunk by function/class. For documentation: chunk by section (h2 headings). For conversations: chunk by topic shift. Overlapping chunks (50-100 token overlap) improve retrieval accuracy at the boundaries.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolTypeScript-first AI agent framework. Workflows, RAG, tool use, evals, and integrations. Built for production Node.js app...
View Tool
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
The TypeScript toolkit for building AI apps. Unified API across OpenAI, Anthropic, Google. Streaming, tool calling, stru...

Anthropic's Big Claude Code & Cowork Update: Remote Control, Scheduled Tasks, Plugins, Auto Memory + New Simplify/Batch Skills The script recaps a consolidated update on new Anthropic releases across

Check out Trae here! https://tinyurl.com/2f8rw4vm In this video, we dive into @Trae_ai a newly launched AI IDE packed with innovative features. I provide a comprehensive demonstration...

Getting Started with OpenAI's New TypeScript Agents SDK: A Comprehensive Guide OpenAI has recently unveiled their Agents SDK within TypeScript, and this video provides a detailed walkthrough...

AI agents fail in ways traditional debugging cannot catch. Here are the tools and patterns for finding and fixing broken...

A practical comparison of the five major AI agent frameworks in 2026 - architecture, code examples, and a decision matri...
A step-by-step guide to building AI agents that actually work. Choose a framework, define tools, wire up the loop, and s...