TL;DR
AI agent work needs local observability. OpenTelemetry, OTLP, Vercel AI SDK telemetry, and lightweight trace viewers give developers receipts for model calls, tool use, latency, errors, and cost before anything goes to production.
Last updated: June 24, 2026
AI agent work needs receipts.
When a coding agent edits a repo, calls tools, queries docs, runs tests, and spends model tokens, the final chat message is not enough. You need to know what happened, how long it took, which model calls were expensive, which tools failed, and where the session slowed down.
That is why local OpenTelemetry matters. It gives AI developers a standard way to collect traces while building, without turning every experiment into a cloud observability project.
DD Traces fits into that lane as a local-first OTLP viewer: point your development app or agent harness at a local endpoint, inspect traces in a browser, and keep prompts, tool calls, and debug context on your machine while you are iterating. The broader point is bigger than one tool: local traces are becoming part of the AI development workflow.
Direct Google Trends access remains rate-limited in this automation run, so I am using it only as query framing. The useful cluster is AI agent observability, OpenTelemetry AI SDK, Vercel AI SDK telemetry, local OpenTelemetry viewer, LLM tracing, Langfuse vs LangSmith, and agent traces.
That cluster is decision intent. Developers are no longer only asking how to call a model. They are asking how to debug model calls, tool calls, latency, cost, and failure paths.
For the workflow layer around this, pair the trace loop with debugging AI agent workflows, AI chat fatigue as a workflow bug, and the Vercel AI SDK guide.
AI agents are not single function calls. A real session can include:
If the work fails, the question is rarely "did the model answer?" The real questions are:
That is the same argument behind agent eval receipts and agent replays. If there is no structured trace, you are debugging from memory.
OpenTelemetry is not an AI-specific framework. That is exactly why it is useful.
The OpenTelemetry project defines common instrumentation, exporters, collectors, and the OTLP protocol. OTLP/HTTP defaults to port 4318, and many vendors can ingest the same basic shape of traces. That means AI traces can sit beside ordinary application traces instead of living in a separate black box.
For AI development, that gives you a better baseline:
The point is not that every developer should become an observability engineer. The point is that trace-shaped evidence is a better artifact than screenshots of chat output.
The Vercel AI SDK has built-in telemetry based on OpenTelemetry. Its docs describe an experimental_telemetry setting for functions like generateText and streamText, with spans that can include function IDs, metadata, and AI call details.
That changes the local development loop. Instead of manually logging every model call, you can instrument the AI boundary once and inspect the result in an OTLP-compatible destination.
A minimal mental model:
const result = await streamText({
model,
messages,
experimental_telemetry: {
isEnabled: true,
functionId: "chat",
metadata: {
route: "/api/chat",
},
},
});
Then configure your OpenTelemetry exporter to send traces somewhere local while developing. A local viewer such as DD Traces can receive OTLP on the standard HTTP path and show the trace before you send anything to a hosted platform.
Newsletter
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.
From the archive
A useful trace is not just a list of model calls. It should show the whole run.
At minimum:
For a coding agent, the best trace also connects to git state: branch, commit, files touched, tests run, and whether the final diff was accepted. That is why permissions, logs, and rollback belong in the same conversation as tracing.
Cloud observability tools are useful. LangSmith, Langfuse, Pydantic Logfire, PostHog, New Relic, and others all have credible AI tracing stories. Langfuse documents a Vercel AI SDK integration built around OpenTelemetry, and LangSmith supports OpenTelemetry tracing as well.
But local-first tracing solves a different problem.
Use local-first traces when:
Use cloud-first traces when:
The healthiest path is often both: local OTLP during development, then a real hosted observability stack when the workflow becomes production infrastructure.
DD Traces is useful when you want a local trace viewer without standing up a full observability stack. Run a local viewer, point OTLP traffic at it, and inspect model/tool spans as you develop.
The right expectation is modest:
That is a feature, not a weakness. Most agent debugging starts as local confusion: "Why did this request take eight seconds?" or "Which tool call returned junk?" or "Did the agent retry the expensive step twice?" A local viewer should answer those questions quickly.
For recurring agent work, combine local traces with long-running agent harnesses and Codex resource budgets. A trace tells you what happened; a harness decides when the agent should stop.
Do not instrument everything on day one. Start with the boundaries that explain failures.
Capture provider, model, function ID, latency, tokens, finish reason, and errors. This tells you whether the slow or expensive part was the model.
Capture tool name, duration, success/failure, and sanitized arguments. Tool calls are where many agent failures begin.
If the agent uses docs, vector search, repo search, or MCP tools, trace those calls. Bad context creates bad edits.
Typecheck, tests, linters, and browser checks should show up as spans or linked logs. An agent run without verification evidence is weak evidence.
The final response should summarize the trace in human terms: changed files, commands run, checks passed, checks skipped, blockers, and remaining risk.
If the trace exposes repeated retries, token spikes, or runaway tool loops, connect it back to Claude Code token burn observability and agent reliability as a systems problem.
AI traces can contain prompts, completions, filenames, user data, stack traces, and secrets. Turn on content capture deliberately. Redact or avoid recording sensitive fields.
Token and cost estimates are useful for debugging, but provider billing rules, cache behavior, and pricing can change. Use traces for directional decisions, then verify against provider billing for policy.
AI traces should not live forever apart from app traces. If the agent calls a database or an API, the model span and system span should eventually appear in the same story.
Production dashboards help after deployment. Local traces help while the agent is still changing code. You need both feedback loops.
Local traces are becoming part of the developer workstation.
Not because every local experiment deserves a dashboard, but because agent work is too opaque without structured evidence. The model can sound confident while a tool quietly failed. The final answer can say tests passed while the trace shows only a narrow command ran. The cost can look fine until one retry loop burns the budget.
OpenTelemetry gives developers a standard shape for that evidence. Local viewers make it usable during development. Hosted platforms make it useful for teams. The winning agent workflows will use both.
If you are building or operating AI agents, start collecting traces before you need them. The first time an agent fails in a way you cannot explain, you will want receipts.
Local OpenTelemetry means collecting traces from AI calls, tool calls, HTTP requests, databases, and verification steps on your development machine. It gives you structured evidence for what an agent did before you send data to a hosted observability system.
OpenTelemetry is a standard. It can connect model spans, application spans, database spans, and HTTP spans in one trace. A custom log may be faster to start, but it usually becomes a silo.
Yes. The AI SDK documentation describes telemetry based on OpenTelemetry through experimental_telemetry. Because it is marked experimental, verify the current API before standardizing production instrumentation.
No. DD Traces fits local development and quick inspection. LangSmith, Langfuse, and similar platforms are stronger for team workflows, retention, evals, dashboards, and production monitoring.
Only deliberately. Prompt and response capture can be useful for debugging, but it can also expose sensitive source code, user data, secrets, or business context. Redact aggressively and keep local traces local unless the team has approved retention rules.
At minimum: model calls, tool calls, slow spans, errors, retries, commands run, checks passed, checks skipped, changed files, and remaining risk. A final report should be a receipt, not just a summary.
Read next
AI agents fail in ways traditional debugging cannot catch. Here are the tools and patterns for finding and fixing broken agent loops, tool failures, and context issues.
9 min readHex's data-agent lab shows the practical eval pattern AI teams should copy: compare candidates against stable baselines, keep receipts, and judge changes by task behavior.
8 min readAgent runs are opaque. TraceTrail turns a Claude Code JSONL into a public share link with a stepped timeline of messages, tool calls, and tokens.
8 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Open-source LLM engineering platform: tracing, evals, prompt management, and datasets. Self-hostable, OpenTelemetry-nati...
View ToolFull-stack AI dev environment in the browser. Describe an app, get a deployed project with database, auth, and hosting....
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View ToolSee exactly what your agent did, locally. No cloud, no signup.
View AppShare agent traces with a link. Keep history long enough to find the bug.
View AppSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
View AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting Started
Build Anything with Vercel, the Agentic Infrastructure Stack Check out Vercel: https://vercel.plug.dev/cwBLgfW The video shows a behind-the-scenes walkthrough of how the creator rapidly builds and d

Check out Trae here! https://tinyurl.com/2f8rw4vm In this video, we dive into @Trae_ai a newly launched AI IDE packed with innovative features. I provide a comprehensive demonstration...

Boost Your Productivity with Augment Code's Remote Agent Feature Sign up: https://www.augment.new/ In this video, learn how to utilize Augment Code's new remote agent feature within your...

AI agents fail in ways traditional debugging cannot catch. Here are the tools and patterns for finding and fixing broken...

Hex's data-agent lab shows the practical eval pattern AI teams should copy: compare candidates against stable baselines,...

Agent runs are opaque. TraceTrail turns a Claude Code JSONL into a public share link with a stepped timeline of messages...

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

A trending Codex SQLite WAL bug is a useful warning for every local coding agent: logs, disks, background processes, and...

The AI SDK is the fastest way to add streaming AI responses to your Next.js app. Here is how to use it with Claude, GPT,...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.