
TL;DR
Goal, loop, routine. Three verbs, two tools, one hard part. A complete field guide to running agentic loops in Claude Code and Codex, the real commands, the patterns people actually run, and the two failure modes that burn money.
Last updated: June 20, 2026. Loop features in Claude Code and Codex move fast. Treat this as a working field guide and confirm exact command behavior in the official docs linked below before you leave anything running unattended.
The shift everyone keeps circling is real, and it is simpler than the discourse makes it sound. Stop being the thing in the loop. Write the goal, the loop, or the routine, give it a budget and a way to check itself, and go decide what to build next. That is loop engineering. Everything else is plumbing.
This guide covers the whole thing: the one distinction almost everyone gets wrong, the real commands in both Claude Code and Codex, the patterns people are actually running in production, and the two failure modes that turn a clever loop into a money fire.
| Topic | Official source |
|---|---|
Claude Code /loop and /schedule | code.claude.com/docs |
| Claude Code Routines (web scheduled tasks) | Web scheduled tasks docs |
| Codex CLI | developers.openai.com/codex/cli |
| Codex Automations | Codex app automations |
Codex exec (non-interactive) | Non-interactive mode |
| The Codex agent loop, unrolled | OpenAI: unrolling the Codex agent loop |
| Loop engineering, the term | Addy Osmani: Loop Engineering |
| Forward Future Loop Library | signals.forwardfuture.ai/loop-library |
The cleanest framing going around separates three things people constantly collapse into one word:
Get the verb right and every pattern in this guide falls into place. Get it wrong and you either leave a hands-on loop running into an empty room, or you point a "while I sleep" routine at a task that needed you watching. The verb is the design decision. The command is just syntax.
Here is how each verb maps to a real command in each tool.
In Claude Code this is /goal, shipped in v2.1.139.
/goal all tests in test/auth pass and the lint step is clean
The important part is not that it loops. The important part is who decides it is done. After every turn, Claude Code sends the condition plus the transcript to a separate, small, fast model (Haiku by default) that acts as a judge. That judge returns yes or no with a reason. On no, Claude reads the reason and takes another turn. On yes, the goal auto-clears and the run stops.
This is the single most important idea in the whole field, so it is worth saying plainly: the worker does not grade its own homework. A separate model does. We will come back to why that matters more than anything else.
Check status any time with a bare /goal. Clear an active goal with /goal clear. Bound it with a turn cap baked into the condition itself, for example "or stop after 20 turns." You can also run it headless:
claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"
Codex shipped its own goal-style controls in CLI v0.128.0, with set, pause, resume, and clear. The shape is the same: a verifiable end state, a checker, and a stop.
In Claude Code this is /loop, and it lives only inside your open session.
/loop 5m check the deploy
Two modes. With an interval it runs on a timer (5m, 30m, 2h), rounding odd values to the nearest cron boundary. Without an interval it self-paces: Claude watches the output and picks the next delay itself, anywhere from one minute to an hour, printing the reason for each wait. Press Esc to cancel while it waits.
A bare /loop with no prompt runs a built-in maintenance pass (git and PR triage plus cleanup). You can override that default by dropping a loop.md into .claude/ for the project or ~/.claude/ globally.
Three things to remember about /loop: it is session-scoped and dies when you close the session, it only fires when the session is idle, and forgotten loops auto-expire after seven days. It is the tool for watching something, hands on, right now. We have a deeper walkthrough in Claude Code Loops: recurring prompts that actually run.
Codex has no /loop command yet. Its equivalent is codex exec wrapped in a shell loop, or a minute-interval thread automation in the Codex app.
In Claude Code this is /schedule, and it runs on Anthropic's cloud, not your laptop.
/schedule daily PR review at 9am
A routine is durable. It survives your machine being closed, clones a fresh copy of your repo each run, and works on claude/-prefixed branches. It can be triggered three ways: on a cron schedule (minimum one-hour interval), on a GitHub webhook (PR opened, labeled, released, and so on), or via a dedicated API endpoint. Manage them with /schedule list, /schedule update, and /schedule run. Full transcripts land at claude.ai/code/routines.
Codex's equivalent is Automations in the Codex app: standalone, project, or thread automations on daily, weekly, or custom cron schedules, with results landing in a Triage inbox.
One trap that comes up constantly: there is no /routine command in either tool. In Claude Code the scheduler is /schedule; in Codex it is Automations. We compare where this work should actually live in Claude Code Routines vs Managed Agents schedules.
/goal | /loop | /schedule | |
|---|---|---|---|
| Verb | until done | while I watch | while I am gone |
| Runs on | your machine | your machine | Anthropic cloud |
| Needs open session | yes | yes | no |
| Stops itself | yes, on verified condition | optional (self-paced) | on schedule end |
| Min interval | n/a | 1 minute | 1 hour |
| Best for | fix until tests pass | watch a deploy | nightly PR sweep |
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 20, 2026 • 11 min read
Jun 19, 2026 • 8 min read
Jun 19, 2026 • 5 min read
Jun 19, 2026 • 8 min read
The commands are easy. The interesting part is what people point them at. Here are the loop shapes that keep surfacing across X, Reddit, GitHub, and the demo circuit, grouped by verb, rewritten as something you can paste tonight.
The most-demoed loop of all. Two roles: a builder that writes code and a checker that runs the tests, types, and lint, then reports exactly what broke. They pass work back and forth until it is clean. The whole pitch is the pain it kills, because a one-shot agent ships its own bugs without ever noticing.
/loop build the next item on the plan, then run tests, typecheck, and
lint. Feed every failure back as the next instruction and fix it. Stop
when the build is green and the checker has nothing left to report.
This is the pattern Boris Cherny, who built Claude Code, keeps describing. In a widely shared interview he put it bluntly: "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." The loop he describes runs the coding agent plus an advanced model plus a verifier, feeds it tasks, and removes bottlenecks as you go. The verifier is the part everyone skips, and without it you are just trusting the agent. Addy Osmani named the broader practice loop engineering, building on Peter Steinberger's line that you should be designing loops that prompt your agents rather than prompting them yourself.
/loop work the task list. After each task, have a separate verifier model
check the result against the spec and the tests. Only move on when it
passes. Surface anything the verifier rejects twice.
We unpack his workflow in Codex Loops: what Boris Cherny gets right about managing agent work.
Peter Steinberger runs a version of this on a tight timer while he works. Every five minutes the agent does one small, verified piece of upkeep. Crucially, what to clean is the agent's call, not a hardcoded script. That decision is the entire point.
/loop 5m make one small verified repository improvement: a flaky test, a
stale comment, a missing type. One change, one commit, tests green. Never
touch anything risky.
The bounded version that fixes the runaway problem cold: plan, generate, verify, fix, repeat, with state saved to files and a hard iteration cap. You only read the final version. The cap is what makes it safe to walk away from.
/goal plan the task, implement it, verify against the tests, and fix what
failed. Save state to files each pass. Max 5 iterations. Stop at the first
clean pass or when the cap is hit, and tell me which.
The one that respects how flaky "it works" really is. It does not stop at the first green run. It tests realistic scenarios and only declares victory after a streak passes in a row. One green run is luck. A streak is reliability.
/goal run the full product test suite against realistic scenarios. Fix
whatever fails, then run again. A new failure resets the count. Done only
after 10 consecutive clean passes.
High utility, and the value is entirely in the triage. It reads your production logs, separates real actionable errors from noise, fixes the actionable ones with a regression test, and opens a PR. Tell it what "actionable" means or it chases ghosts.
/goal review the last 24h of production errors. For each one that is
actionable and reproducible, write a fix with a regression test and open
a PR. Ignore transient and third-party noise. Done when the actionable
list is clear.
A class of open-source tools now installs a git hook so every commit triggers a background review, then feeds the findings straight into an agentic fix loop while the context is still warm. It is the installable version of the one thing this whole guide argues is the hard part: a verifier living inside the loop. The pattern plugs into Claude Code, Codex, and Gemini CLI.
init # adds a post-commit hook: every commit triggers a review
fix # the agentic loop that fixes the surfaced findings
The shape behind "I do not write code anymore, I write loops, and they write the code while I sleep." A scheduled routine that watches your open PRs and lands the fixable ones overnight, leaving anything ambiguous for you.
/schedule every night, watch my open PRs. Auto-fix build failures, answer
review comments in a fresh worktree, and rebase what is stale. Leave
anything ambiguous for me. State in git so a crash loses nothing.
More on this in Overnight agents: the workflow and Claude Code autonomous hours.
The rare community post that ships a whole production loop instead of a demo: an email agent that loops an inbox on a schedule, classifies and drafts replies for the routine cases, and escalates only what needs a human. The guardrail is hardcoded.
/schedule every 15 minutes, pull new emails, classify each, and draft a
reply for the routine ones. Queue anything sensitive for me and log every
decision. Never auto-send a refund or a booking change.
The most practical no-code pattern. The workflow runs, then pauses and pings you with approve, revise, or skip. Same loop shape as build-test-fix, but the stop condition is your approval instead of a passing test.
/loop run the task, then pause and send me approve / revise / skip before
anything ships. On approve, continue. On revise, take my note and redo. On
skip, move to the next item.
The strongest reliability pattern: have one model family review another model family's pull requests before merge, so two independent sets of weights have to agree before code lands. Cap the argument so it cannot spin forever.
/goal implement the task, then have a second, different model review the
diff against the spec. Iterate up to 5 rounds. Only pass work both models
agree is correct. Report any disagreement you could not resolve.
The single highest-leverage habit is not a fancy loop. It is rewriting your request into a rigorous goal before any work starts. Most agents are not dumb, the instructions are just vague. So make the agent specify the result, how it will verify it, what not to touch, and when to stop, before it does anything.
/goal before doing anything, rewrite my request into a precise goal: the
exact end state, how you will verify it, what you must not touch, and the
stop condition. Confirm that goal, then execute against it.
A good /goal condition has four parts: one measurable end state (a test exit code, an empty queue, a clean git status), proof of how the agent demonstrates it, the constraints that must hold, and an optional turn or time bound. The condition field caps at 4,000 characters, which is more than enough for a precise contract and a deliberate forcing function against vagueness.
Across every platform, the same two warnings come back. They are the whole reason most loops fail, and they are funnier in the community's words than in any vendor's docs.
The romantic version of loops is "a thousand agents build my company overnight." The production version is a bill. The community is full of these stories. Enterprises have started capping per-engineer, per-tool monthly spend after burning through annual AI budgets in a single quarter. Individual developers have posted about torching thousands of dollars overnight with one command. The figures vary and many are hard to verify independently, but the direction is consistent and the mechanism is obvious: a loop with no ceiling will happily spend until your tokens run out.
The best one-line summary of the whole movement is a developer joke written as code:
while (you have tokens):
burn them in a loop
So every goal gets a budget and every loop gets a cap. Goal conditions can carry "or stop after N turns." Routines run with a daily ceiling. Set the ceiling before you walk away, not after the email arrives. We wrote the full incident-driven version of this in The $400 overnight bill: why managed agents need FinOps now.
A loop that cannot tell good output from bad does not save you work. It produces wrong answers faster. This is the single most important sentence in loop engineering: writing the loop is easy, the verifier inside it is the hard part.
An open loop, a loop with no verifier, fails in predictable, expensive ways: compounding errors, hallucinated progress, silent failures, goal drift, and the doom loop where the agent retries the same broken approach with cosmetic variations. A verifier closes the loop. It is anything that checks a result against an expected outcome: a compiler, a test runner, a linter, a diff check, a second model as judge, or a human approval gate. The useful rule of thumb is that the verifier should be cheaper and more reliable than the action it checks.
This is exactly why /goal runs a separate small model as judge instead of letting the worker grade its own work, and why every strong loop in this guide (the verifier loop, the build-test-fix pair, the adversarial review) puts a second, independent set of eyes inside the loop. An agent grading itself will delete the failing test and call it done. The skeptics who keep pointing this out are right, and listening to them is what keeps a loop honest.
The skeptics also have a fair point about the scheduling layer: yes, the cron part really is just cron. But cron never had a decision-maker in the body that reads the state, acts, checks whether it worked, and decides whether to keep going. That decision is the genuinely new thing. Everything else is plumbing. We traced where this idea came from in The loopy era: Karpathy, Codex, and agentic engineering.
The fastest way to get good loops without designing them yourself is a curated catalog. Matthew Berman's Forward Future Loop Library is the one worth raiding: dozens of copy-paste loops across engineering, evaluation, operations, content, and design, each with a clear goal, a bounded action, a fixed check, and explicit stop conditions. The catalog is agent-native, with llms.txt and catalog.json surfaces, an open repo, and an installable skill:
npx skills add Forward-Future/loop-library --skill loop-library -g
The signal there is the vetting, not a like count. When you want a loop running tonight without designing the plumbing, start from a vetted one and adapt it.
You do not need every pattern above. The whole field keeps converging on three moves, one of each verb:
/loop so something measurably improves while you watch./loop while you work on something else./schedule so you wake up to finished work.Give each one a budget and a verifier. That is a working loop stack by tomorrow morning. Then graduate the watched loops to /goal conditions once you trust the verifier, and move the durable ones to /schedule once you trust the budget.
The shift is real and it is not complicated. Write the goal, the loop, or the routine. Give it a budget and a way to check itself. Then go decide what to build next.
Read next
Claude Code now has a native Loop feature for scheduling recurring prompts - from one-minute intervals to three-day windows. Fix builds on repeat, summarize Slack channels, email yourself Hacker News digests. All from the CLI.
6 min readBoris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs, CI, deploys, and feedback streams.
8 min readClaude Code Routines and Managed Agents scheduled deployments both run Claude on a schedule - here is how the triggers, pricing, and limits differ, and which one fits your recurring agent work.
9 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Mac app for running parallel Claude Code, Codex, and Cursor agents in isolated workspaces. Watch every agent work at onc...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolOpenAI's coding agent for terminal, cloud, IDE, GitHub, Slack, and Linear workflows. Reads repos, edits files, runs comm...
View ToolOpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View ToolEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppPick the hooks you want, get a settings.json you can paste in.
View AppTurn a one-liner into a working Claude Code skill. From idea to installed in a minute.
View AppReal-time prompt loop with history, completions, and multiline input.
Claude CodeConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsSet up Codex Chronicle on macOS, manage permissions, and understand privacy, security, and troubleshooting.
Getting Started
Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

OpenAI Codex ‘Record & Replay’: Turn Screen Recordings Into Reusable Automation Skills The script explains a new OpenAI Codex feature, Record and Replay, which lets you record a recurring computer or...

Open Design: Open-Source n8n App That Turns Any Website into a Brand Kit, Design System, HTML + Images The video introduces Open Design, an MIT-licensed full-stack template that combines AI and n8n a...

Claude Code now has a native Loop feature for scheduling recurring prompts - from one-minute intervals to three-day wi...

Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs,...

Claude Code Routines and Managed Agents scheduled deployments both run Claude on a schedule - here is how the triggers,...

Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager...
How to spec agent tasks that run overnight and wake up to verified, reviewable code. The spec format, pipeline, and revi...

Claude Opus 4.5 ran autonomously for 4 hours 49 minutes using stop hooks and the Ralph Loop pattern. Walk away, come bac...

Five managed-agent providers, five pricing models, zero unified cost attribution. If you're running agents overnight, yo...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.