The $400 Overnight Bill: Why Managed Agents Need FinOps Now

The $437 Morning#

I woke up to a $437 bill from an agent I asked to write a TypeScript refactor.

For cost context, read AI Agents Explained: A TypeScript Developer's Guide alongside How to Build AI Agents in TypeScript; together they separate sticker price from the operational habits that make agent work expensive.

Last updated: May 25, 2026. Managed-agent features and pricing change quickly. Treat this as an operations guide, and confirm current billing behavior in the official sources below before setting policy.

Official Sources#

Start with these primary references so you can validate pricing units and operational controls:

Topic	Official source
Claude Managed Agents (overview)	Anthropic launch post
Claude Managed Agents (docs)	Managed agents docs
OpenAI Agents SDK (TypeScript docs)	openai-agents-js docs
OpenAI Agents SDK (TypeScript repo)	openai/openai-agents-js
Devin pricing	devin.ai/pricing
Cursor pricing	cursor.com/pricing
GitHub Copilot pricing	GitHub Copilot pricing
OpenTelemetry (tracing foundation)	OpenTelemetry

The task was not small but it was not $437 big. Port a mid-sized service from a bespoke event emitter to a typed pub/sub primitive, update the tests, open a PR. I spun it up around 11pm, watched the first few steps, saw it start editing the right files, and went to bed. When I opened my laptop at 7am there was no PR. There was a still-running session, a loop counter past 200, and a billing dashboard that made me close the tab and open it again to make sure I was reading it right.

What actually happened was mundane. The agent hit a failing test, tried to fix it, broke a neighboring test, tried to fix that, went back to the first one, and kept oscillating. Each pass retried a web search. Each pass re-read the file tree. There was no per-session cap. There was no cross-provider ceiling. There was no kill switch attached to dollars. There was only the assumption, inherited from a decade of SaaS, that the bill at the end of the month would be roughly the bill I expected at the start of it. Managed agents broke that assumption and nobody has rebuilt it yet.

Why It Is Getting Worse#

The managed-agent category just crystallized. Anthropic launched Claude Managed Agents with a pricing surface that combines model tokens with session-style runtime billing and tool surcharges. OpenAI's Agents SDK sits on standard API tokens plus hosted runtime costs in supported sandbox environments. That is two frontier providers, two pricing models, already incompatible before you leave the first tier. Confirm the current units and rates in the official sources above, and treat any blog-post number you see (including mine) as stale by default.

Past that tier it fractures more. Devin prices in ACUs, where one ACU is roughly 15 agent-minutes and plans scale from $20 pay-as-you-go to $500 for 250 ACUs. Cursor Background Agents ride on a Pro subscription with metered Max-mode overflow, and field reports are landing around $4.63 per "easy" PR. GitHub Copilot charges per-seat with 300 premium requests bundled and $0.04 for every overflow request. Replit has effort-based credits with Economy, Power, and Turbo tiers. Jules is free up to 15 tasks a day, then tiered. Factory is token-based with no per-seat floor.

Five pricing models (tokens, session-hour, per-task/ACU, per-seat with overflow, outcome-based) layered with infinite hybrids. No unified attribution. Every provider shows you their own dashboard, their own units, their own refresh cadence. When an overnight run goes sideways, piecing together what actually happened requires opening five browser tabs, correlating timestamps by hand, and trusting that each dashboard updated before you looked at it. The category grew up faster than its observability did, and FinOps is the thing that is missing.

The Three Failure Modes#

In my own logs and in enough war stories from other teams to call it a pattern, three specific failure modes account for almost every overnight blowup.

1. Pathological loops. The agent retries the same broken test two hundred times. This is what happened to me. It usually starts with a test that fails for a reason the agent cannot see (an environment variable, a flaky external call, a test depending on order), and the agent interprets the red bar as a code problem. It edits, reruns, edits, reruns. With session-billed managed agents, an eight-hour loop can get expensive fast even before you add tool surcharges. Prevention: every agent run gets a max-iterations cap in the harness, every test failure that recurs more than three times triggers an escalation to a human, and every session gets a hard dollar ceiling that kills the process rather than "warns" the user.

2. Tool call explosion. The agent decides it needs context and calls web search in a hot loop. Claude Managed Agents bills web search at $10 per 1,000 queries. It takes one broken retry pattern to rack up 3,000 searches across a single task, which is $30 in tool calls on top of whatever the model tokens cost. I have seen a Cursor Background Agent run up 400 MCP calls in a session that should have needed six. Prevention: set per-tool quotas at the harness level (max 50 web searches per session, max 100 file reads, max 10 shell executions), log every tool call with its cost, and treat tool-call count as a first-class alert metric.

3. Context swapping. The agent keeps re-reading the same file tree. This one is quiet. Each pass through a task, the agent reloads the project structure, rereads five or ten files it has already seen, and pushes them into a fresh context window. On a 1M-context model like GPT-5.4, this is cheap in wall time but expensive in tokens because you are sending 300K input tokens per iteration and compaction does not always kick in the way you want. Ten iterations at 300K input tokens each is 3M tokens per session, and on GPT-5.3-Codex at $1.75 input per million that is $5 per run, per agent, before you add output. Run five agents in parallel overnight and you spent $25 on file rereads. Prevention: force the harness to use compaction aggressively, cache file contents with hash keys across iterations, and instrument input-token growth per step so that a reread spike is visible before the bill is.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

10 CLI Tools Reshaping AI Development in 2026

Apr 19, 2026 • 8 min read

How I'm Building 24 AI-Powered Apps in Parallel

Apr 19, 2026 • 9 min read

Claude Code vs Codex vs Cursor vs OpenCode: Which Agent Ships More Code?

Apr 19, 2026 • 10 min read

How to Write a CLAUDE.md: The Complete 2026 Guide

Apr 19, 2026 • 12 min read

What Is Missing In The Market#

No one has cross-provider cost attribution. Say that again with the weight it deserves. You can go to Anthropic's dashboard and see tokens and session hours. You can go to OpenAI's dashboard and see tokens and sandbox storage. You can go to Devin and see ACUs. You can go to Cursor and see Max-mode overflow. You cannot go to any dashboard today and see a single task called "refactor the event emitter" with the total cost across the two frontier providers, the one sandbox vendor, and the one web-search tool it touched. That span does not exist.

What you need is straightforward to describe and has been refused by the market for eighteen months. You need unified spans tagged per-agent, per-user, per-task, with model tokens, session time, tool calls, and dollar cost attached to each span. You need parent-child span relationships so that a task like "run the test suite" groups its twenty tool calls under a single parent, and a larger task like "ship this PR" groups the test-suite span under itself. You need the ability to filter by provider, by model, by user, and by task and get the exact dollar number out.

This is the OpenTelemetry trace model. OTel already has the vocabulary: traces, spans, resource attributes, semantic conventions. The GenAI semantic conventions already define gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens. What is missing is adoption by managed-agent providers, a coherent cost enrichment layer sitting on top of those spans, and a receiver that holds the data close enough to you that you can ask questions across providers without waiting for each vendor's BI team to ship a feature.

The DD Traces Angle#

DD Traces was built for exactly this gap. Today it does local OTel trace collection. The minimal OTLP receiver shipped last week and already ingests spans from Claude Code, Cursor, Codex, Augment, Gemini, and Kimi sessions running on your machine. You point an OTLP exporter at localhost:4318, you run your agents, and you get a unified view of what ran where.

Cost attribution is the next step and I am not going to pretend it is done. The piece that exists is the trace plumbing and the cross-tool schema. The piece that is TODO is the enrichment layer that reads a span's provider and model, looks up the current pricing, multiplies the tokens and tool calls, and attaches a dollar figure as a span attribute. That layer is in the roadmap, not in main, and the honest framing is: we have the pipes, we have the schema, we do not yet have the pricing table plugged in or the budget-ceiling kill switch wired up.

Where we are thinking next: a pricing table for Claude Managed Agents, OpenAI Agents SDK, Devin, Cursor, and Copilot that updates weekly. A dollars_spent attribute enriched onto every span at ingest time. A dashboard view that filters by provider, model, user, and task. A budget alert that fires when a single trace crosses a ceiling, with a follow-on kill hook that can signal the harness to stop. That is what FinOps for managed agents looks like and it is what DD Traces is building toward.

If you want to watch that happen, the project is at traces.developersdigest.tech and the receiver is MIT licensed. Opinions on semantic conventions, pricing sources, and kill-hook design are welcome and frankly necessary.

What To Do Tonight#

You do not need DD Traces to protect yourself this week. Three concrete things, in order.

1. Set hard dollar caps in each provider's UI. Anthropic, OpenAI, Cursor, Devin, Copilot, and Replit all have usage limits under billing settings. Set them now, before the next overnight run. A cap that lives in the provider dashboard will not save you from bad code but it will save you from the worst version of "forgot to set a cap." Before you set the cap, run a quick estimate through our AI cost calculator so the number you pick is grounded in actual token math rather than a guess. Set it per-workspace, set it per-key, set it aggressive (a 2x your expected monthly spend is fine, 10x is not).

2. Run agents with OTel tracing turned on. Every major agent framework now supports OTLP export: Claude Agent SDK, OpenAI Agents SDK, LangGraph, Mastra, Vercel AI SDK. Set OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 and send spans to any collector (DD Traces, Langfuse local, a plain file sink). Even without cost enrichment, having the traces on disk means you can reconstruct what happened when the bill arrives.

3. Write a $10 sentinel loop. The top action, the one that matters most if you do nothing else tonight: a shell script or cron job that polls each provider's usage API every 5 minutes, sums the spend since session start, and kills the agent process if it crosses a threshold you set. Ten dollars is a good first number because it is high enough to let real work finish and low enough that a runaway loop gets caught in the first hour. It is ugly and it works and it has saved me from another $437 morning more than once.

The managed-agent category is two weeks old in its current shape and the FinOps layer under it is weeks behind. Cost attribution across Claude, OpenAI, Devin, and Cursor is the next thing that has to exist for any serious team to run agents overnight without a finance conversation in the morning. DD Traces is one attempt at it. If you are running agents and want the trace layer today, grab it at traces.developersdigest.tech. If you are picking which managed agent to run in the first place, the comparison lives at agenthub.developersdigest.tech.

The $437 bill was a cheap lesson. The next one will not be.

Frequently Asked Questions#

Why do AI agents run up unexpected bills overnight?#

AI agents operate in loops - reason, act, observe, repeat. When an agent encounters an error it cannot resolve (like a flaky test or missing environment variable), it may retry the same failing step hundreds of times. Each retry consumes tokens, triggers tool calls, and accumulates session time. Without hard caps, this loop runs until you notice it or until a provider limit kicks in. The problem is compounded by multiple cost dimensions: model tokens, session hours, tool calls, and sandbox compute all bill separately.

How do I prevent runaway AI agent costs?#

Three immediate actions: First, set hard dollar caps in each provider's billing settings (Anthropic, OpenAI, Cursor, Devin - all have usage limits). Second, implement max-iteration caps in your agent harness so that any agent stops after a defined number of steps. Third, set per-tool quotas (max 50 web searches per session, max 100 file reads) and treat tool-call counts as alert metrics. A simple shell script that polls usage APIs every 5 minutes and kills processes at a threshold ($10 is a reasonable starting point) catches runaway loops within the first hour.

What is the pricing model for managed AI agents?#

There is no unified model. Some providers combine token billing with session-hour style runtime costs and tool-call surcharges. Others use tokens plus sandbox runtime usage. Others price per seat, per task credits, or "agent compute units." Each provider has its own dashboard, its own units, and its own billing cycle - there is no cross-provider cost attribution yet. Use the official sources above for the current unit definitions, then build your own normalized cost model on top of traces.

What is FinOps for AI agents?#

FinOps (Financial Operations) for AI agents is the practice of monitoring, attributing, and controlling the costs of running autonomous AI systems. It includes unified cost tracking across providers, per-task and per-user attribution, budget alerts, and automated kill switches when spending thresholds are exceeded. The OpenTelemetry trace model provides the foundation - traces, spans, and resource attributes can carry cost data if providers adopt the semantic conventions and teams build enrichment layers that convert tokens and tool calls into dollar figures.

How much does an AI agent cost per task?#

Costs vary dramatically based on task complexity and failure modes. Field reports show Cursor Background Agents averaging $4-5 per "easy" PR. Claude Managed Agents with Opus 4.6 can run $5-25 per session depending on token usage and tool calls. A pathological loop that retries 200 times can easily reach $200-400 in a single overnight session. The unpredictability is the problem - the same task can cost $5 or $50 depending on whether the agent hits an error it cannot escape.

What is OpenTelemetry and how does it help with agent costs?#

OpenTelemetry (OTel) is a vendor-neutral observability standard for traces, metrics, and logs. For AI agents, it provides a way to track every model call, tool invocation, and session as spans within a trace. The GenAI semantic conventions already define attributes like gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. With OTel tracing enabled (supported by Claude Agent SDK, OpenAI Agents SDK, LangGraph, Vercel AI SDK), you can reconstruct what happened in any session. Cost enrichment layers can then multiply tokens by pricing to attach dollar figures to each span.

Which AI agent provider is most cost-effective?#

It depends on your usage pattern. For high-volume, short tasks, per-seat models like GitHub Copilot may be cheapest. For complex, long-running tasks, token-based pricing with aggressive caching (like OpenAI Agents SDK) can work well. For predictable workloads, ACU-based models like Devin offer cost certainty. The absence of cross-provider attribution makes comparison difficult - the only way to know is to run parallel experiments with cost tracking enabled and measure actual spend per task type.

How do I track AI agent costs across multiple providers?#

Today, you cannot do this from any single dashboard. You must open each provider's billing page, correlate timestamps manually, and sum the costs yourself. Tools like DD Traces are building cross-provider cost attribution using OpenTelemetry spans, but the enrichment layer that converts tokens and tool calls to dollars is still in development. The interim solution is to enable OTel tracing on all agents, send spans to a local collector, and build your own cost lookup table until the tooling matures.

The $437 Morning#

I woke up to a $437 bill from an agent I asked to write a TypeScript refactor.

Official Sources#

Start with these primary references so you can validate pricing units and operational controls:

Topic	Official source
Claude Managed Agents (overview)	Anthropic launch post
Claude Managed Agents (docs)	Managed agents docs
OpenAI Agents SDK (TypeScript docs)	openai-agents-js docs
OpenAI Agents SDK (TypeScript repo)	openai/openai-agents-js
Devin pricing	devin.ai/pricing
Cursor pricing	cursor.com/pricing
GitHub Copilot pricing	GitHub Copilot pricing
OpenTelemetry (tracing foundation)	OpenTelemetry

Why It Is Getting Worse#

The Three Failure Modes#

In my own logs and in enough war stories from other teams to call it a pattern, three specific failure modes account for almost every overnight blowup.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

What Is Missing In The Market#

The DD Traces Angle#

What To Do Tonight#

You do not need DD Traces to protect yourself this week. Three concrete things, in order.

The $437 bill was a cheap lesson. The next one will not be.

The $437 Morning#

Official Sources#

Why It Is Getting Worse#

The Three Failure Modes#

10 CLI Tools Reshaping AI Development in 2026

How I'm Building 24 AI-Powered Apps in Parallel

Claude Code vs Codex vs Cursor vs OpenCode: Which Agent Ships More Code?

How to Write a CLAUDE.md: The Complete 2026 Guide

What Is Missing In The Market#

The DD Traces Angle#

What To Do Tonight#

Frequently Asked Questions#

Why do AI agents run up unexpected bills overnight?#

How do I prevent runaway AI agent costs?#

What is the pricing model for managed AI agents?#

What is FinOps for AI agents?#

How much does an AI agent cost per task?#

What is OpenTelemetry and how does it help with agent costs?#

Which AI agent provider is most cost-effective?#

How do I track AI agent costs across multiple providers?#

How to Build AI Agents in TypeScript

AI Agents Explained: A TypeScript Developer's Guide

AI Coding Tools Pricing Comparison 2026

Related Tools

Composio

OpenAI Agents SDK

Kimi Code

Agency Swarm

Apps from Developers Digest

Overnight Agents

Related Guides

Claude Code Setup Guide

MCP Servers Explained

Claude Code Complete Course

Related Videos

Agents 101: How to Build and Deploy Anything with AI Agents

TRAE: Custom AI Agents That Actually Understand Your Codebase

Introducing Augment Remote Agent: Parallel Autonomous AI Agents

Related Posts

How to Build AI Agents in TypeScript

AI Agents Explained: A TypeScript Developer's Guide

AI Coding Tools Pricing Comparison 2026

LangChain vs Vercel AI SDK: Which TypeScript AI Framework Should You Use?

AI Coding Tools Pricing Comparison 2026

Claude Code Usage Limits in 2026: The Practical Playbook for Pro and Max Teams

Build with the member tools

Get Smarter About AI Dev

The $437 Morning#

Official Sources#

Why It Is Getting Worse#

The Three Failure Modes#

10 CLI Tools Reshaping AI Development in 2026

How I'm Building 24 AI-Powered Apps in Parallel

Claude Code vs Codex vs Cursor vs OpenCode: Which Agent Ships More Code?

How to Write a CLAUDE.md: The Complete 2026 Guide

What Is Missing In The Market#

The DD Traces Angle#

What To Do Tonight#

Frequently Asked Questions#

Why do AI agents run up unexpected bills overnight?#

How do I prevent runaway AI agent costs?#

What is the pricing model for managed AI agents?#

What is FinOps for AI agents?#

How much does an AI agent cost per task?#

What is OpenTelemetry and how does it help with agent costs?#

Which AI agent provider is most cost-effective?#

How do I track AI agent costs across multiple providers?#

How to Build AI Agents in TypeScript

AI Agents Explained: A TypeScript Developer's Guide

AI Coding Tools Pricing Comparison 2026

Related Tools

Composio

OpenAI Agents SDK

Kimi Code

Agency Swarm

Apps from Developers Digest

Overnight Agents

Related Guides

Claude Code Setup Guide

MCP Servers Explained