
TL;DR
I told an agent to improve the site every 10 minutes and went to sleep. Here is what 12 new repos, 60 PRs, and three goofs taught me about overnight orchestration.
At 11:47pm on April 28 I typed eight words into Claude Code:
/loop 10m improve the Developers Digest website overnight
Then I closed the laptop and went to bed.
By 1:53am the orchestrator session had spawned dozens of subagents, opened 59 pull requests across 21 repositories, scaffolded 12 new private repos, drafted 6 blog tutorials, written 2 video scripts, generated 8 distribution packages, and shipped a 5-PR backend migration end to end. By the time I woke up, the morning brief sitting in my repo was the longest tally I have ever seen from a single prompt.
This is not a Claude Code ad. The system did real work, but it also got things wrong, hit billing walls I had not budgeted for, and at one point fabricated a PR number that did not exist. The interesting story is the mix. So this is the candid version: what worked, what broke, and three lessons for anyone considering doing the same.
The shape of the run was simple enough to describe in a paragraph and complicated enough that I am still untangling it.
A parent orchestrator session held the loop. Every 10 minutes a cron-style tick fired off a planning step. The planner read the current state of the empire (24 apps under developersdigest, plus my standing rules), picked a batch of independent goals, and fanned out subagents in parallel to execute them. Some agents wrote code. Some scaffolded new repos. Some drafted blog posts. Some audited cross-repo consistency and filed reports. Each agent worked on its own branch in its own repo, opened a PR, tagged @devin-ai-integration for review, and exited.
The parent never merged. That was deliberate. My standing rule is: branch, PR, tag Devin, never direct-push to main. The overnight session inherited that rule and held to it across 60 PRs without exception.
The other rule it inherited was equally non-negotiable: nothing public on GitHub without my explicit say-so. Every one of the 12 new repos was created with --visibility private. I checked all of them in the morning. None had slipped.
Parallel fan-out scales further than I expected. A single tick would routinely have 5 to 8 subagents running concurrently. One cycle scaffolded mcp-lens, tracetrail, and cost-tape in parallel while a separate group of agents added Sentry observability to four production apps and a third group enriched 817 detail pages with generateMetadata and JSON-LD across four directory sites. The bottleneck was never compute. It was always coordination, and most coordination was avoided by keeping each agent's blast radius tight: one repo, one branch, one PR.
The dogfood loop closed itself. Three separate moments stood out. dd-content-engine PR #5 shipped a real Markdown to X / LinkedIn / newsletter fanout. By the next cycle, distribution agents were drafting their packages with the same fanout. tracetrail, scaffolded around 02:30 UTC, was wired into overnight-agents PR #4 within the hour as an "Open in TraceTrail" button on the runs page. repo-postcard, also new tonight, generated the 12 card PNGs that landed in developers-digest-site PR #47 for the new /apps entries. The system was building tools and using them in the same session.
Voice rules held under load. My DevDigest voice rules are explicit: no em dashes, no emojis, no superlatives, no gradients, no "blazing fast." Across 6 blog drafts, 8 distribution packages, and 2 video scripts, the consistency was genuinely strong. I spot-checked 14 markdown files this morning and found zero em dashes and zero emojis. Whatever is in the system prompt for tone is sticking.
Reports were honest. I asked for cross-repo audits across the empire and got four written deliverables, not code: PRODUCT-IDEAS-2026-04-28.md, agent-ecosystem-2026-04-28.md, APPS-TIGHTEN-STATUS-clerk-neon-2026-04-28-v2.md, GA-IDEAS-2026-04-28.md. The GA audit caught 18 apps hardcoding the same Google Analytics ID, which scaffolded the dd-ga repo to fix it. That is the loop I want from this kind of session: audit produces report, report seeds product, product fixes audit.
The Convex to Neon migration shipped end to end. This was the most ambitious unit of work. Five sequential PRs in dd-clipper (#4 jobs storage, #6 apiKeys, #7 apiCredits, #8 apiUsageLog, #9 clips) walking the schema across one table at a time. The agent that owned this thread held the dependency order, rebased when it hit conflicts on #4, and produced a docs: convex surface + neon migration plan companion PR (#5) so the next person could audit the cutover. That sequence is documented separately in PR #49.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
The GitHub Actions billing wall. Sometime around 03:15 UTC, every CI run on every open PR started failing with "The job was not started because recent account payments have failed or your spending limit needs to be increased." The org card had a billing failure. I did not know about it until I woke up. The agent kept opening PRs anyway because that was the right move, but it meant Devin had no CI signal to review against. Every one of the 60 PRs is currently red for a reason that has nothing to do with the code. Lesson: the orchestrator needs a billing check at the top of every loop, the same way it checks for gh auth status.
The gradient violation. One agent, drafting a redesigned /pro waitlist landing in PR #37, introduced a hero with a gradient background. My rules say no gradients, full stop. A subsequent QA agent on a later cycle caught it, opened a follow-up commit on the same branch, and replaced the gradient with the solid bg-cream and a pink offset card. The system self-corrected, but only because I happen to run a recurring QA agent. Without that, the violation would have shipped to review and waited on Devin to flag.
The fake PR number. This one is the most uncomfortable. Mid-run, an agent reported that it had opened "PR #51" against developers-digest-site for a sitemap improvement. The morning brief picked up the report. When I went to look, there was no PR #51. There was a branch with the work on it, sitting unpushed-as-PR. The agent had described an outcome that had not happened yet, the parent had taken the report at face value, and the brief had repeated it. I caught it because the PR table in the brief sorted by number and #51 was missing between #50 and #52. The actual PR was opened by hand once I confirmed the branch existed. I do not know yet whether the agent hallucinated the action or whether the gh pr create call failed silently and was misreported. Either way: trust nothing the orchestrator says about a PR number until you have seen it in gh pr list.
The rebase cascade. dd-clipper PR #4 hit conflicts because two earlier cycles had touched the same convex/schema.ts region. The owning agent flagged it, but the rebase took a separate agent and a full cycle to resolve. During that window the four downstream PRs (#6, #7, #8, #9) were blocked. Sequential migrations and fan-out parallelism do not mix as cleanly as I thought.
1. Decompose for independence, not for parallelism. The work that paralleled cleanly was work that touched separate repos or separate files. The work that did not (sequential migrations, schema changes, anything with implicit ordering) created queues and rebases. Before a loop starts, ask the planner to draw the dependency graph, then only fan out the leaves.
2. Verify every claim against the system of record. Agents will report what they meant to do, what they think they did, and what they actually did, and these three are not always the same. Run a reconciliation pass at the end of every cycle: gh pr list --json number,title --limit 100 and diff against the agent's claims. The fake PR #51 would have been caught instantly by this.
3. Pre-flight your invariants. Billing was the one I missed. Other ones to check before starting an overnight: disk space on the host, gh rate limit budget, model context budget, any required secrets for the tasks the planner might pick, and whether main is already broken on any repo (one of mine, dd-cron, had a pre-existing /api/health build failure that masked a perfectly good favicon PR). If any invariant is red, the loop should pause and tell me, not push through.
The output is real. 12 private repos, each with a working scaffold and a README. 60 open PRs, each branched, tagged, and reviewable. 6 blog drafts at draft-true so I can edit before publishing. 817 newly-enriched SEO pages across the directory sites. A backend migration shipped in a single night that I had been dragging my feet on for two weeks. If I had to do this with my hands it would have taken a working week.
The cost is not just dollars (the dollars I will know when the bill lands). The cost is the morning I am spending right now reconciling what was claimed against what is real, fixing the billing block, merging the boring PRs first, deciding which of the 12 new repos are worth keeping versus archiving. The agent did the producing. I have to do the curating, and curating 60 PRs is its own non-trivial day.
The cost is also trust calibration. After tonight I trust the system more on bounded tasks (one repo, one PR, clearly scoped) and less on multi-step claims about its own outputs. I will run another loop next week, but with a reconciliation step inside the loop and a billing pre-flight at the top.
If you want to see what came out of it, the /apps page lists the 12 new tools as coming-soon entries, the comparison hub was reorganized in PR #36, the 10 tools announcement draft sits behind PR #42, and the Convex to Neon migration war story draft sits behind PR #49.
For anyone trying this themselves: the loop works. It works better when you treat the agent like a junior engineer who is genuinely fast, occasionally wrong, and structurally incapable of admitting which is which without help. Build the help in. Then go to bed.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Multi-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolMulti-agent orchestration framework. Define agents with roles, goals, and tools, then assign them tasks in a crew. Pytho...
View ToolA practical walk-through of how to design, write, and ship a Claude Code skill - from choosing when to trigger, through allowed-tools, to the steps the agent will actually follow.
Getting StartedConfigure model, tools, MCP, skills, memory, and scoping.
Claude CodeLimit which tools a subagent can access.
Claude Code
From single-agent baselines to multi-level hierarchies, these are the seven patterns for wiring AI agents together in pr...

Ten private tools shipped overnight - observability, skills, hooks, prompts, and evals - aimed at the agent infrastructu...

The math of agent pipelines is brutal. 85% reliability per step compounds to about 20% at 10 steps. Here is why long cha...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.