Coordinating an Agent Fleet for a Day: The Operating Model That Actually Held

We rebuilt and replatformed this site in a single day by running a fleet of AI agents in parallel. The design story lives in its own post: why we retired the old cream-and-pink system for a hard-edged neutral contract, and what we chose. This post is about the other half, the part that is harder to see and much easier to get wrong: the orchestration. How do you point dozens of agent runs at one codebase in one day and end up with a coherent site instead of a pile of conflicting commits?

The short version is that the model is not clever. It is boring, and boring is the point. Coordination fails in exciting ways and succeeds in dull ones. What follows is the operating model that held, the guardrails that made it safe, and the specific failure modes we hit that day with the fix each one produced. None of it is theoretical. All of it cost us something to learn.

If you want the framework-level vocabulary first - fan-out, pipeline, hierarchical delegation, blackboard - read the definitive guide to coordinating multiple AI agents. This post assumes you already know the patterns and want the field notes.

The operating model that worked

Seven rules did most of the work. Each one exists because the alternative bit us at some point, either that day or before it.

1. Single-owner file scopes

The first rule is the one everything else rests on: never two writers per file. Every agent gets a scope, and scopes do not overlap at the file level. One agent owns the homepage. Another owns the blog templates. A third owns the global tokens. When two agents both need to touch a shared file, that is a signal to serialize them, not to let them both edit and merge later.

This sounds obvious and is constantly violated in practice, because the natural decomposition of a task ("redesign the site") does not respect file boundaries ("both the nav and the footer import the same tokens file"). The discipline is to decompose along ownership lines, not feature lines. If a change spans a shared file, one agent lands the shared change first, and the others build on top of it.

2. Serialized dependency installs

Package management is a shared-file problem with extra teeth. Two agents running pnpm add at the same time race on the lockfile and package.json, and the loser's install silently vanishes or corrupts the tree. So one agent owns package.json at a time. Dependency installs are serialized through a single owner, full stop. Component-library installs are the same: one agent runs the install, adapts the component to the design contract, commits, and only then does downstream work start.

3. A verification gate on every handoff

This is the load-bearing rule. The orchestrator runs the same gate on every single handoff, not at the end of the day. The gate is: typecheck, style check, build, and one more step that catches the failure the other three miss.

That extra step is the committed-tree-in-isolated-worktree trick. Agents leave in-progress files in the working tree. A commit can pass every local check while importing a file that was never staged, because the file exists on disk but not in the commit. Local tooling sees the file; the CI runner, which only has the commit, does not. So the gate checks out the actual commit into a throwaway worktree and typechecks that, in isolation from the working tree. If the commit imports something it did not include, this catches it before it reaches the deploy pipeline. Nothing else does.

The principle generalizes past our specific stack: verify the artifact you are about to ship, not the environment you built it in. The working directory lies. The commit does not.

4. Draft-first for anything externally visible

Anything that leaves the building starts as a draft for review. Content, public copy, anything a reader or a customer would see. The agent produces it, a human or a review pass approves it, and only then does it ship. This is not about distrust of the model. It is that the cost of a bad externally-visible change is asymmetric, and the cost of a review pass is small. When the downside is public and hard to reverse, you pay the small tax every time.

5. Standing constraints broadcast to all agents

Some rules are not task-specific; they apply to every agent regardless of scope. Banned topics. The design contract - square corners, hairline borders, no gradients, no em dashes. These are broadcast to every agent as standing constraints, and they are codified in the project instructions so they are inherited, not remembered. A constraint you have to remember is a constraint you will eventually break. A constraint the codebase and the brief both enforce stays enforced. This is what let agents working on different pages produce work that looked like it came from one hand.

6. Fail-closed defaults for anything that spends money

Any action that spends money or touches a live external system defaults to off. If an agent is unsure whether it is authorized to make a paid call, provision infrastructure, or hit a production endpoint, the default is to stop and ask, not to proceed and apologize. Fail-closed is the only safe default for irreversible or costly actions, because the failure mode of asking is a few seconds of latency and the failure mode of proceeding is a bill or an outage.

7. Continuous shipping, never batch a day's work

The last rule is a rhythm: verify, commit, push, per increment. Never let a day of parallel work pile up into one giant unreviewed merge. Each increment goes through the gate and ships on its own. Batching feels efficient and is a trap: it hides which change broke what, it makes the verification gate slower and scarier, and it turns a small revert into a large one. Small, continuous, verified increments keep the blast radius of any single mistake tiny.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Cursor Composer 2.5 Developer Guide 2026

Jul 1, 2026 • 8 min read

Orchestrating a Fleet of Agents with Fable 5

Jul 1, 2026 • 8 min read

Running Fable 5 Agent Fleets in Production: The Operations Guide

Jul 1, 2026 • 8 min read

Fable 5 Is Back: The Anthropic Model the Government Switched Off

Jul 1, 2026 • 6 min read

The failure modes we hit, honestly

Rules read as clean in a list. They were not clean to learn. Here are the actual failures from the day and the guardrail each one produced. This is the part worth reading twice, because the failures are more transferable than the successes.

Agents assuming unshipped sibling exports. An agent imported a function it expected a sibling agent to have exported, because the plan said that function would exist. But the sibling had not shipped it yet, or had named it differently. The code looked correct in isolation and broke at integration. Guardrail: agents build against what is committed, not against what is promised. If an export does not exist in the tree yet, you do not import it; you serialize behind the agent that owns it.

Mid-write files breaking global CSS. An agent was partway through editing the global stylesheet when a downstream build picked up the half-written file, and the broken CSS cascaded across every page at once. A shared global file in a mid-write state is a site-wide outage waiting to happen. Guardrail: shared global files get a single owner who lands complete, verified changes, and downstream work does not build against a global file that is mid-edit.

Silent idles with no reports. An agent went quiet. Not failed, not finished, just idle, and it produced no report, so the orchestrator did not know whether it was working, stuck, or done. Silence is ambiguous and ambiguity stalls the whole fleet. Guardrail: every agent reports on handoff. A run that goes silent without a report is treated as stalled and gets checked, not assumed to be making progress.

Env files clobbered by a tool. A tool overwrote an environment file, wiping configuration that other work depended on. Guardrail: treat env and other shared config files as owned, single-writer surfaces exactly like source files, and never let a tool rewrite them as a side effect without that write going through the same ownership and verification path as any other change.

Deploy pipeline broken by a package-manager default. The deploy failed on a package-manager default we did not know had changed: the newer major version hard-blocks dependency build scripts unless they are explicitly approved, in a config key that moved between versions. Installs that worked locally failed in CI. Guardrail: test dependency changes with a clean, frozen-lockfile install that mirrors CI, not just the warm local install, because the warm local environment hides exactly the failures the cold CI environment will hit.

The through-line across every one of these: the failure was never the model being dumb. It was two pieces of work making incompatible assumptions about shared state - a file, an export, an env var, a lockfile, a build default - and the fix was always the same shape. Make the shared thing owned, verify the real artifact, and never assume a sibling's promise is a sibling's commit.

A starter checklist

If you are about to point a fleet of agents at your own codebase, start here. This is the shortest version of what took us a day of mistakes to internalize.

Assign single-owner file scopes. No file has two writers. Decompose along ownership lines, not feature lines.
Serialize dependency installs through one owner. One agent holds package.json at a time.
Run a verification gate on every handoff: typecheck, style check, build.
Add the isolated-worktree check to that gate. Verify the committed tree, not the working directory, so a commit that imports an unstaged file gets caught before CI.
Draft-first everything externally visible. Human or review-pass approval before anything public ships.
Broadcast standing constraints to all agents, and codify them in project instructions so they are inherited, not remembered.
Default to fail-closed on anything that spends money or touches production. Unsure means stop and ask.
Ship continuously: verify, commit, push per increment. Never batch a day of work into one merge.
Require a report on every handoff. Treat silence as stalled, not as progress.
Test dependency and config changes against a clean, CI-like install before pushing, not just the warm local environment.

None of these are exotic. That is the lesson. Coordinating a fleet of agents is mostly the same discipline as coordinating a team of people: clear ownership, honest verification, small reversible increments, and safe defaults when the stakes are high. The agents move faster, so the cost of skipping the discipline arrives faster too. Get the operating model right and the speed is a gift. Skip it and the speed just multiplies your surface area for silent breakage.

If you want the next layer down, the orchestration patterns guide covers the mechanics of each coordination shape, and managing a fleet of Claude agents goes deeper on the day-to-day of running one. To go from patterns to a durable skill set, follow a learning path or browse the agents library.

Frequently Asked Questions

How do you stop parallel agents from overwriting each other's work?

Single-owner file scopes. Every agent gets a scope, and no two scopes touch the same file. Decompose the work along ownership lines rather than feature lines, because features naturally span shared files while ownership does not. When a change genuinely needs a shared file, one agent lands that change first and the others build on top of it. Dependency installs and env files are the highest-risk shared surfaces, so they always go through a single owner.

What is the verification gate you run on every handoff?

Four checks: a typecheck, a style check for banned patterns, a full build, and an isolated-worktree check. That last one checks out the actual commit into a throwaway worktree and typechecks it in isolation from the working directory. It catches the case where a commit imports a file that exists on disk but was never staged, which passes every local check and then fails in CI. The rule is to verify the artifact you are shipping, not the environment you built it in.

Do you need a framework to coordinate agents like this?

No. The operating model here is about discipline, not tooling: ownership boundaries, a verification gate, draft-first review, fail-closed defaults, and continuous shipping. Those apply regardless of whether you use a framework or raw orchestration. Frameworks help once you need explicit loops or shared state, which the coordination guide covers, but the guardrails that keep a fleet safe are process, not library choice.