The Definitive Guide to Loop Engineering in Claude Code and Codex

Last updated: June 20, 2026. Loop features in Claude Code and Codex move fast. Treat this as a working field guide and confirm exact command behavior in the official docs linked below before you leave anything running unattended.

The shift everyone keeps circling is real, and it is simpler than the discourse makes it sound. Stop being the thing in the loop. Write the goal, the loop, or the routine, give it a budget and a way to check itself, and go decide what to build next. That is loop engineering. Everything else is plumbing.

This guide covers the whole thing: the one distinction almost everyone gets wrong, the real commands in both Claude Code and Codex, the patterns people are actually running in production, and the two failure modes that turn a clever loop into a money fire.

Official Sources

Topic	Official source
Claude Code `/loop` and `/schedule`	code.claude.com/docs
Claude Code Routines (web scheduled tasks)	Web scheduled tasks docs
Codex CLI	developers.openai.com/codex/cli
Codex Automations	Codex app automations
Codex `exec` (non-interactive)	Non-interactive mode
The Codex agent loop, unrolled	OpenAI: unrolling the Codex agent loop
Loop engineering, the term	Addy Osmani: Loop Engineering
Forward Future Loop Library	signals.forwardfuture.ai/loop-library

First, the three verbs (this is where everyone trips)

The cleanest framing going around separates three things people constantly collapse into one word:

Goal: keep working until the outcome is achieved, then stop.
Loop: keep repeating a task while I am here, hands on.
Routine: keep working while I am gone.

Get the verb right and every pattern in this guide falls into place. Get it wrong and you either leave a hands-on loop running into an empty room, or you point a "while I sleep" routine at a task that needed you watching. The verb is the design decision. The command is just syntax.

Here is how each verb maps to a real command in each tool.

Goal: run until a condition is true, then stop

In Claude Code this is /goal, shipped in v2.1.139.

/goal all tests in test/auth pass and the lint step is clean

The important part is not that it loops. The important part is who decides it is done. After every turn, Claude Code sends the condition plus the transcript to a separate, small, fast model (Haiku by default) that acts as a judge. That judge returns yes or no with a reason. On no, Claude reads the reason and takes another turn. On yes, the goal auto-clears and the run stops.

This is the single most important idea in the whole field, so it is worth saying plainly: the worker does not grade its own homework. A separate model does. We will come back to why that matters more than anything else.

Check status any time with a bare /goal. Clear an active goal with /goal clear. Bound it with a turn cap baked into the condition itself, for example "or stop after 20 turns." You can also run it headless:

claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"

Codex shipped its own goal-style controls in CLI v0.128.0, with set, pause, resume, and clear. The shape is the same: a verifiable end state, a checker, and a stop.

Loop: repeat while you watch

In Claude Code this is /loop, and it lives only inside your open session.

/loop 5m check the deploy

Two modes. With an interval it runs on a timer (5m, 30m, 2h), rounding odd values to the nearest cron boundary. Without an interval it self-paces: Claude watches the output and picks the next delay itself, anywhere from one minute to an hour, printing the reason for each wait. Press Esc to cancel while it waits.

A bare /loop with no prompt runs a built-in maintenance pass (git and PR triage plus cleanup). You can override that default by dropping a loop.md into .claude/ for the project or ~/.claude/ globally.

Three things to remember about /loop: it is session-scoped and dies when you close the session, it only fires when the session is idle, and forgotten loops auto-expire after seven days. It is the tool for watching something, hands on, right now. We have a deeper walkthrough in Claude Code Loops: recurring prompts that actually run.

Codex has no /loop command yet. Its equivalent is codex exec wrapped in a shell loop, or a minute-interval thread automation in the Codex app.

Routine: run while you are gone

In Claude Code this is /schedule, and it runs on Anthropic's cloud, not your laptop.

/schedule daily PR review at 9am

A routine is durable. It survives your machine being closed, clones a fresh copy of your repo each run, and works on claude/-prefixed branches. It can be triggered three ways: on a cron schedule (minimum one-hour interval), on a GitHub webhook (PR opened, labeled, released, and so on), or via a dedicated API endpoint. Manage them with /schedule list, /schedule update, and /schedule run. Full transcripts land at claude.ai/code/routines.

Codex's equivalent is Automations in the Codex app: standalone, project, or thread automations on daily, weekly, or custom cron schedules, with results landing in a Triage inbox.

One trap that comes up constantly: there is no /routine command in either tool. In Claude Code the scheduler is /schedule; in Codex it is Automations. We compare where this work should actually live in Claude Code Routines vs Managed Agents schedules.

The one table to remember

	`/goal`	`/loop`	`/schedule`
Verb	until done	while I watch	while I am gone
Runs on	your machine	your machine	Anthropic cloud
Needs open session	yes	yes	no
Stops itself	yes, on verified condition	optional (self-paced)	on schedule end
Min interval	n/a	1 minute	1 hour
Best for	fix until tests pass	watch a deploy	nightly PR sweep

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

The Router Era: Why Not Owning a Frontier Model Became an Advantage

Jun 20, 2026 • 11 min read

DuckDB Internals: What Makes It So Fast

Jun 19, 2026 • 8 min read

Three Ways to Ignore Files in Git (Beyond .gitignore)

Jun 19, 2026 • 5 min read

GitHub Copilot Agent Finder: What ARD Means for Third-Party AI Tools in 2026

Jun 19, 2026 • 8 min read

The patterns people are actually running

The commands are easy. The interesting part is what people point them at. Here are the loop shapes that keep surfacing across X, Reddit, GitHub, and the demo circuit, grouped by verb, rewritten as something you can paste tonight.

1. The build-test-fix pair (loop)

The most-demoed loop of all. Two roles: a builder that writes code and a checker that runs the tests, types, and lint, then reports exactly what broke. They pass work back and forth until it is clean. The whole pitch is the pain it kills, because a one-shot agent ships its own bugs without ever noticing.

/loop build the next item on the plan, then run tests, typecheck, and
lint. Feed every failure back as the next instruction and fix it. Stop
when the build is green and the checker has nothing left to report.

2. The verifier loop (loop)

This is the pattern Boris Cherny, who built Claude Code, keeps describing. In a widely shared interview he put it bluntly: "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." The loop he describes runs the coding agent plus an advanced model plus a verifier, feeds it tasks, and removes bottlenecks as you go. The verifier is the part everyone skips, and without it you are just trusting the agent. Addy Osmani named the broader practice loop engineering, building on Peter Steinberger's line that you should be designing loops that prompt your agents rather than prompting them yourself.

/loop work the task list. After each task, have a separate verifier model
check the result against the spec and the tests. Only move on when it
passes. Surface anything the verifier rejects twice.

We unpack his workflow in Codex Loops: what Boris Cherny gets right about managing agent work.

3. The five-minute repository maintainer (loop)

Peter Steinberger runs a version of this on a tight timer while he works. Every five minutes the agent does one small, verified piece of upkeep. Crucially, what to clean is the agent's call, not a hardcoded script. That decision is the entire point.

/loop 5m make one small verified repository improvement: a flaky test, a
stale comment, a missing type. One change, one commit, tests green. Never
touch anything risky.

4. The plan-generate-verify-fix loop (goal)

The bounded version that fixes the runaway problem cold: plan, generate, verify, fix, repeat, with state saved to files and a hard iteration cap. You only read the final version. The cap is what makes it safe to walk away from.

/goal plan the task, implement it, verify against the tests, and fix what
failed. Save state to files each pass. Max 5 iterations. Stop at the first
clean pass or when the cap is hit, and tell me which.

5. The quality streak loop (goal)

The one that respects how flaky "it works" really is. It does not stop at the first green run. It tests realistic scenarios and only declares victory after a streak passes in a row. One green run is luck. A streak is reliability.

/goal run the full product test suite against realistic scenarios. Fix
whatever fails, then run again. A new failure resets the count. Done only
after 10 consecutive clean passes.

6. The production error sweep (goal)

High utility, and the value is entirely in the triage. It reads your production logs, separates real actionable errors from noise, fixes the actionable ones with a regression test, and opens a PR. Tell it what "actionable" means or it chases ghosts.

/goal review the last 24h of production errors. For each one that is
actionable and reproducible, write a fix with a regression test and open
a PR. Ignore transient and third-party noise. Done when the actionable
list is clear.

7. The post-commit review loop (shipped tool)

A class of open-source tools now installs a git hook so every commit triggers a background review, then feeds the findings straight into an agentic fix loop while the context is still warm. It is the installable version of the one thing this whole guide argues is the hard part: a verifier living inside the loop. The pattern plugs into Claude Code, Codex, and Gemini CLI.

init    # adds a post-commit hook: every commit triggers a review
fix     # the agentic loop that fixes the surfaced findings

8. The overnight PR routine (routine)

The shape behind "I do not write code anymore, I write loops, and they write the code while I sleep." A scheduled routine that watches your open PRs and lands the fixable ones overnight, leaving anything ambiguous for you.

/schedule every night, watch my open PRs. Auto-fix build failures, answer
review comments in a fresh worktree, and rebase what is stale. Leave
anything ambiguous for me. State in git so a crash loses nothing.

More on this in Overnight agents: the workflow and Claude Code autonomous hours.

9. The production inbox loop (routine)

The rare community post that ships a whole production loop instead of a demo: an email agent that loops an inbox on a schedule, classifies and drafts replies for the routine cases, and escalates only what needs a human. The guardrail is hardcoded.

/schedule every 15 minutes, pull new emails, classify each, and draft a
reply for the routine ones. Queue anything sensitive for me and log every
decision. Never auto-send a refund or a booking change.

10. The human-in-the-loop approval queue (loop)

The most practical no-code pattern. The workflow runs, then pauses and pings you with approve, revise, or skip. Same loop shape as build-test-fix, but the stop condition is your approval instead of a passing test.

/loop run the task, then pause and send me approve / revise / skip before
anything ships. On approve, continue. On revise, take my note and redo. On
skip, move to the next item.

11. The adversarial review loop (goal)

The strongest reliability pattern: have one model family review another model family's pull requests before merge, so two independent sets of weights have to agree before code lands. Cap the argument so it cannot spin forever.

/goal implement the task, then have a second, different model review the
diff against the spec. Iterate up to 5 rounds. Only pass work both models
agree is correct. Report any disagreement you could not resolve.

The goal-meta-skill: the highest-leverage move

The single highest-leverage habit is not a fancy loop. It is rewriting your request into a rigorous goal before any work starts. Most agents are not dumb, the instructions are just vague. So make the agent specify the result, how it will verify it, what not to touch, and when to stop, before it does anything.

/goal before doing anything, rewrite my request into a precise goal: the
exact end state, how you will verify it, what you must not touch, and the
stop condition. Confirm that goal, then execute against it.

A good /goal condition has four parts: one measurable end state (a test exit code, an empty queue, a clean git status), proof of how the agent demonstrates it, the constraints that must hold, and an optional turn or time bound. The condition field caps at 4,000 characters, which is more than enough for a precise contract and a deliberate forcing function against vagueness.

The part the hype skips: a loop is a money fire with a verifier on top

Across every platform, the same two warnings come back. They are the whole reason most loops fail, and they are funnier in the community's words than in any vendor's docs.

Warning one: cost

The romantic version of loops is "a thousand agents build my company overnight." The production version is a bill. The community is full of these stories. Enterprises have started capping per-engineer, per-tool monthly spend after burning through annual AI budgets in a single quarter. Individual developers have posted about torching thousands of dollars overnight with one command. The figures vary and many are hard to verify independently, but the direction is consistent and the mechanism is obvious: a loop with no ceiling will happily spend until your tokens run out.

The best one-line summary of the whole movement is a developer joke written as code:

while (you have tokens):
    burn them in a loop

So every goal gets a budget and every loop gets a cap. Goal conditions can carry "or stop after N turns." Routines run with a daily ceiling. Set the ceiling before you walk away, not after the email arrives. We wrote the full incident-driven version of this in The $400 overnight bill: why managed agents need FinOps now.

Warning two: verification is the entire game

A loop that cannot tell good output from bad does not save you work. It produces wrong answers faster. This is the single most important sentence in loop engineering: writing the loop is easy, the verifier inside it is the hard part.

An open loop, a loop with no verifier, fails in predictable, expensive ways: compounding errors, hallucinated progress, silent failures, goal drift, and the doom loop where the agent retries the same broken approach with cosmetic variations. A verifier closes the loop. It is anything that checks a result against an expected outcome: a compiler, a test runner, a linter, a diff check, a second model as judge, or a human approval gate. The useful rule of thumb is that the verifier should be cheaper and more reliable than the action it checks.

This is exactly why /goal runs a separate small model as judge instead of letting the worker grade its own work, and why every strong loop in this guide (the verifier loop, the build-test-fix pair, the adversarial review) puts a second, independent set of eyes inside the loop. An agent grading itself will delete the failing test and call it done. The skeptics who keep pointing this out are right, and listening to them is what keeps a loop honest.

The skeptics also have a fair point about the scheduling layer: yes, the cron part really is just cron. But cron never had a decision-maker in the body that reads the state, acts, checks whether it worked, and decides whether to keep going. That decision is the genuinely new thing. Everything else is plumbing. We traced where this idea came from in The loopy era: Karpathy, Codex, and agentic engineering.

Where to find vetted loops

The fastest way to get good loops without designing them yourself is a curated catalog. Matthew Berman's Forward Future Loop Library is the one worth raiding: dozens of copy-paste loops across engineering, evaluation, operations, content, and design, each with a clear goal, a bounded action, a fixed check, and explicit stop conditions. The catalog is agent-native, with llms.txt and catalog.json surfaces, an open repo, and an installable skill:

npx skills add Forward-Future/loop-library --skill loop-library -g

The signal there is the vetting, not a like count. When you want a loop running tonight without designing the plumbing, start from a vetted one and adapt it.

How to start tonight

You do not need every pattern above. The whole field keeps converging on three moves, one of each verb:

Run the build-test-fix pair as a /loop so something measurably improves while you watch.
Run the five-minute maintainer as a /loop while you work on something else.
Run the overnight PR routine as a /schedule so you wake up to finished work.

Give each one a budget and a verifier. That is a working loop stack by tomorrow morning. Then graduate the watched loops to /goal conditions once you trust the verifier, and move the durable ones to /schedule once you trust the budget.

The shift is real and it is not complicated. Write the goal, the loop, or the routine. Give it a budget and a way to check itself. Then go decide what to build next.

Official Sources

First, the three verbs (this is where everyone trips)

Goal: run until a condition is true, then stop

Loop: repeat while you watch

Routine: run while you are gone

The one table to remember

The Router Era: Why Not Owning a Frontier Model Became an Advantage

DuckDB Internals: What Makes It So Fast

Three Ways to Ignore Files in Git (Beyond .gitignore)

GitHub Copilot Agent Finder: What ARD Means for Third-Party AI Tools in 2026

The patterns people are actually running

1. The build-test-fix pair (loop)

2. The verifier loop (loop)

3. The five-minute repository maintainer (loop)

4. The plan-generate-verify-fix loop (goal)

5. The quality streak loop (goal)

6. The production error sweep (goal)

7. The post-commit review loop (shipped tool)

8. The overnight PR routine (routine)

9. The production inbox loop (routine)

10. The human-in-the-loop approval queue (loop)

11. The adversarial review loop (goal)

The goal-meta-skill: the highest-leverage move

The part the hype skips: a loop is a money fire with a verifier on top

Warning one: cost

Warning two: verification is the entire game

Where to find vetted loops

How to start tonight

Claude Code Loops: Recurring Prompts That Actually Run

Codex Loops: What Boris Cherny Gets Right About Managing Agent Work

Claude Code Routines vs Managed Agents Schedules: Where Recurring Agent Work Should Live

Related Tools

Conductor

Claude Code

OpenAI Codex

ChatGPT

Apps from Developers Digest

Agent Hub

Hooks Directory

Skill Builder

Related Guides

Interactive Mode - Claude Code

Claude Code Setup Guide

Chronicle Research Preview Setup Guide

Related Videos

Nimbalyst: The Open-Source Visual Workspace for Building with Codex and Claude Code

Codex: Record & Replay in 9 Minutes

Open Design: Turn Websites into Design Assets for Cursor & Claude Code

Related Posts

Claude Code Loops: Recurring Prompts That Actually Run

Codex Loops: What Boris Cherny Gets Right About Managing Agent Work

Claude Code Routines vs Managed Agents Schedules: Where Recurring Agent Work Should Live

Karpathy's Loopy Era Is the Best Way to Understand Codex

Ship Code While You Sleep: The Overnight Agent Workflow

The Ralph Loop: Running Claude Code For Hours Autonomously

The $400 Overnight Bill: Why Managed Agents Need FinOps Now

Get Smarter About AI Dev

Official Sources

First, the three verbs (this is where everyone trips)

Goal: run until a condition is true, then stop

Loop: repeat while you watch

Routine: run while you are gone

The one table to remember

The Router Era: Why Not Owning a Frontier Model Became an Advantage

DuckDB Internals: What Makes It So Fast

Three Ways to Ignore Files in Git (Beyond .gitignore)

GitHub Copilot Agent Finder: What ARD Means for Third-Party AI Tools in 2026

The patterns people are actually running

1. The build-test-fix pair (loop)

2. The verifier loop (loop)

3. The five-minute repository maintainer (loop)

4. The plan-generate-verify-fix loop (goal)

5. The quality streak loop (goal)

6. The production error sweep (goal)

7. The post-commit review loop (shipped tool)

8. The overnight PR routine (routine)

9. The production inbox loop (routine)

10. The human-in-the-loop approval queue (loop)

11. The adversarial review loop (goal)

The goal-meta-skill: the highest-leverage move