
AI Agents Deep Dive
11 partsTL;DR
AI agents fail in ways traditional debugging cannot catch. Here are the tools and patterns for finding and fixing broken agent loops, tool failures, and context issues.
Read next
A practical guide to building AI agents with TypeScript using the Vercel AI SDK. Tool use, multi-step reasoning, and real patterns you can ship today.
10 min readFrom swarms to pipelines - here are the patterns for coordinating multiple AI agents in TypeScript applications.
6 min readA practical comparison of the five major AI agent frameworks in 2026 - architecture, code examples, and a decision matrix to help you pick the right one.
14 min readTraditional debugging is about finding where code breaks. Agent debugging is about finding where reasoning breaks. The code runs fine. The model just made the wrong decision. If you are still designing the loop itself, start with how to build AI agents in TypeScript and the agent architecture guide.
Here are the patterns that actually work.
You need visibility into three things:
Without all three, you are guessing. This is also why long-running agent harnesses and DD Traces for local OpenTelemetry matter: they turn agent behavior into something you can inspect after the run.
Log every tool call with structured data. Not just "tool called" - the full input, output, and timing.
interface ToolLog {
tool: string;
input: Record<string, unknown>;
output: unknown;
durationMs: number;
timestamp: number;
step: number;
}
function wrapTool<T>(name: string, fn: (input: T) => Promise<unknown>) {
return async (input: T, step: number): Promise<{ result: unknown; log: ToolLog }> => {
const start = Date.now();
try {
const result = await fn(input);
const log: ToolLog = {
tool: name,
input: input as Record<string, unknown>,
output: result,
durationMs: Date.now() - start,
timestamp: start,
step,
};
return { result, log };
} catch (error) {
const log: ToolLog = {
tool: name,
input: input as Record<string, unknown>,
output: { error: String(error) },
durationMs: Date.now() - start,
timestamp: start,
step,
};
return { result: null, log };
}
};
}
When an agent goes wrong, you can trace the exact sequence: step 3 called search_files with the wrong query, got no results, then hallucinated the file content.
The most common agent failure is context overflow. The agent loses important information because the context window filled up with tool outputs.
function trackContext(messages: Message[]): ContextSnapshot {
const totalTokens = estimateTokens(messages);
const breakdown = messages.map((m) => ({
role: m.role,
tokens: estimateTokens([m]),
preview: m.content.slice(0, 100),
}));
return {
totalTokens,
maxTokens: 200_000,
utilization: totalTokens / 200_000,
breakdown,
warning: totalTokens > 150_000 ? "Context 75%+ full" : null,
};
}
If your agent starts failing after 10+ steps, it is almost always context overflow. The fix: summarize intermediate results instead of keeping raw tool outputs.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Before each action, ask the agent to explain its reasoning in structured form.
const decisionSchema = z.object({
observation: z.string().describe("What I see in the current state"),
reasoning: z.string().describe("Why I chose this action"),
action: z.string().describe("What I will do next"),
confidence: z.number().min(0).max(1).describe("How confident I am"),
alternatives: z.array(z.string()).describe("Other actions I considered"),
});
When confidence drops below 0.5, you know exactly where the agent got uncertain. This is where human review adds the most value.
Save the full agent trajectory so you can replay it.
interface AgentTrajectory {
task: string;
steps: {
thought: string;
action: string;
toolInput: unknown;
toolOutput: unknown;
contextTokens: number;
}[];
outcome: "success" | "failure" | "timeout";
totalSteps: number;
totalDurationMs: number;
}
// Save trajectory
async function saveTrajectory(trajectory: AgentTrajectory) {
const id = `${Date.now()}-${trajectory.task.slice(0, 30)}`;
await fs.writeFile(
`./traces/${id}.json`,
JSON.stringify(trajectory, null, 2)
);
}
When a similar task fails, diff the successful trajectory against the failing one. The divergence point is usually the bug.
If you are using Claude Code, hooks give you deterministic debugging points. The companion Claude Code hooks guide explains the lifecycle events, and Hookyard covers the packaged workflow for teams that want reusable hook installs.
{
"hooks": {
"PostToolUse": [
{
"matcher": ".*",
"command": "echo \"Tool: $TOOL_NAME | Exit: $EXIT_CODE\" >> /tmp/claude-debug.log"
}
],
"Stop": [
{
"command": "echo \"Session ended at $(date)\" >> /tmp/claude-debug.log"
}
]
}
}
Every tool call gets logged. Every session end gets recorded. Review the log when something goes wrong.
Infinite loops. The agent keeps retrying the same action. Fix: add a step counter and bail after N attempts.
Tool misuse. The agent calls a tool with the wrong arguments. Fix: improve tool descriptions and add input validation.
Context poisoning. A large tool output fills the context with irrelevant data. Fix: truncate or summarize tool outputs before adding to context.
Premature termination. The agent thinks it is done but it is not. Fix: add verification steps that check the actual result against the original task.
Wrong tool selection. The agent picks the wrong tool for the job. Fix: make tool descriptions more specific about when to use each tool.
Not every agent failure needs code fixes. Sometimes the right answer is human review at critical points:
The best agent systems are not fully autonomous. They are autonomous for the easy parts and interactive for the hard parts.
Context overflow. After enough tool calls, the context window fills with intermediate results and the agent loses track of the original task. The fix is summarizing intermediate results and managing context deliberately.
Use hooks to log every tool call. Add a PostToolUse hook that records the tool name, input, and exit code. Review the log file to trace the exact decision sequence. The /transcript command also helps.
Yes. Structured tool logs (JSON with tool name, input, output, duration, step number) are essential. You can filter, query, and diff them. Plain text logs are almost useless for multi-step agent debugging.
Add a max step counter and a loop detector. Track the last N actions - if the same tool+input combination appears 3 times, break the loop and ask for human input.
Before destructive actions, when the agent's confidence is low, after consecutive failures, and before declaring a task complete. The goal is not to remove the human - it is to minimize unnecessary interruptions while keeping critical checkpoints.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
TypeScript-first AI agent framework. Workflows, RAG, tool use, evals, and integrations. Built for production Node.js app...
View ToolOpen-source autonomous coding agent inside VS Code. Creates files, runs commands, and can use a browser for UI testing a...
View ToolAnthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token c...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolSee exactly what your agent did, locally. No cloud, no signup.
View AppInspect Claude Code transcripts to see which files, tools, and tokens are filling the context window.
View AppSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
View AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI Agents
A practical guide to building AI agents with TypeScript using the Vercel AI SDK. Tool use, multi-step reasoning, and rea...

From swarms to pipelines - here are the patterns for coordinating multiple AI agents in TypeScript applications.

A practical comparison of the five major AI agent frameworks in 2026 - architecture, code examples, and a decision matri...

AI agents use LLMs to complete multi-step tasks autonomously. Here is how they work and how to build them in TypeScript.

MCP lets AI agents connect to databases, APIs, and tools. Here is what it is and how to use it in your TypeScript projec...

Persistent memory for coding agents is trending because every session still starts too cold. The hard part is not saving...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.