
TL;DR
Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs, CI, deploys, and feedback streams.
Read next
Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries. Here is the practical playbook.
9 min readA deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.
18 min readCodex is no longer just a terminal agent. Here is when to use the Codex SDK, Codex CLI, or openai/codex-action, and how to avoid building the same agent loop three times.
8 min readBoris Cherny's recent interview is worth watching because it names the thing most AI coding demos still hide: the future of agent work is not one perfect prompt. It is many supervised loops.
In the interview, Boris describes a personal Claude Code setup that has moved far past "agent writes a diff." He talks about running multiple sessions, using sub-agents heavily, and leaning more and more on /loop: recurring agent jobs scheduled with cron. The examples are wonderfully boring:
That is the useful part. The examples are not magical. They are the exact maintenance chores every engineering team already does poorly.
This is also where Codex content should go next. Codex automations, Codex goals, the Codex GitHub Action, and the Codex cloud security playbook all point in the same direction: the winning agent workflow is a loop with boundaries, receipts, and escalation rules.
The first AI coding workflow was a task:
Fix this bug.
The second workflow was a scoped task:
Fix the billing webhook validation.
Only touch app/api/billing and lib/billing.
Run pnpm test billing and pnpm typecheck.
Return changed files, tests run, and risks.
The loop workflow is different:
Every 15 minutes, inspect open PRs labeled codex-watch.
If CI is red for a deterministic reason, attempt one fix.
If main moved, rebase once.
If the same failure appears twice, stop and leave a concise report.
Never push directly to main.
That is not just "task, repeated." It has a trigger, scope, action budget, stop condition, and reporting path. Those are the pieces that turn an agent from a clever assistant into a useful background process.
One-shot agents are good at bounded edits. Loops are good at changing state.
A PR changes after review comments land. CI changes after a dependency cache expires. A deployment changes after Coolify finishes building. User feedback changes every hour. A model eval changes after new examples arrive. These are not single-shot problems. They are state-monitoring problems.
That is why Boris's examples land. PR babysitting and CI repair are high-value because they sit in the annoying gap between "the code is basically right" and "the work is actually merged."
Codex is well positioned for this because the surface area is already there:
The missing piece is not capability. It is loop design.
Every useful Codex loop should fit on one page.
name: pr-babysitter
trigger:
every: 15m
scope:
include:
- pull_requests:
labels: ["codex-watch"]
exclude:
- main
permissions:
repo: write-branch
ci: read
deploys: read
budget:
max_attempts_per_pr: 1
max_runtime_minutes: 20
max_files_changed: 8
stop:
- same_failure_seen_twice
- merge_conflict_requires_product_decision
- tests_fail_after_one_fix
report:
destination: pr-comment
fields:
- summary
- action_taken
- tests_run
- remaining_blocker
The contract matters because loops are powerful in the same way cron jobs are powerful: they keep running after the interesting part is over.
Without a contract, a loop becomes background chaos. With a contract, it becomes a junior operations teammate that handles the boring parts and escalates the judgment calls.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
May 5, 2026 • 7 min read
May 5, 2026 • 7 min read
May 5, 2026 • 9 min read
May 5, 2026 • 7 min read
Start with loops that are safe, boring, and obviously reviewable.
Trigger: every 15 minutes on PRs with a label.
Job:
Stop if the same failure appears twice. Stop if the branch has merge conflicts that require a human decision. Stop if the fix touches files outside the declared scope.
This is the cleanest Codex loop because it maps to GitHub's natural workflow. The output is a PR comment, a small branch commit, or a status report.
Trigger: every 30 minutes on main.
Job:
The important thing is not letting the agent quietly mutate production code. The first version should be report-only. Once the reports are useful, let it open a branch for the top deterministic failure.
This pairs well with long-running agent harnesses, because CI health is exactly where retry limits, tool logs, and receipts matter.
Trigger: after push to main, or every 10 minutes while a deploy is in progress.
Job:
/api/health;This is the loop I want for content automation. A blog post is not done when the commit lands. It is done when production returns 200 and the page references the expected hero image.
For Codex, this should be a first-class recurring pattern because it is one of the easiest ways to turn agent work into visible shipped work.
Trigger: every 30 or 60 minutes.
Job:
Boris mentioned clustering Twitter feedback. That is the exact pattern content teams should steal. It turns the outside world into a recurring editorial signal.
For Developers Digest, this is how "go hard on Codex" becomes a system:
Loops fail differently from one-shot agents.
A one-shot agent fails and stops. A loop fails and comes back in 15 minutes.
That can be good. It can also create the exact cost pattern from the $400 overnight agent bill: retry, inspect, edit, rerun, repeat.
Every loop needs a hard budget:
A loop can keep acting on yesterday's plan after today's context changes.
Fix: every loop run starts by refreshing the state it depends on. For PRs, fetch latest base and head. For CI, inspect the current run, not the last one cached in context. For deploys, ask production, not local build output.
If five loops can touch the same PR, you do not have automation. You have a race condition.
Assign ownership:
Shared read access is fine. Shared write access should be rare.
The best loop is not the one that never asks for help. The best loop is the one that knows when it has hit a judgment boundary.
Escalate when:
This is where agents become useful teammates instead of background scripts with model access.
The important insight in the interview is not that Boris runs an absurd number of agents. Most teams should not copy that directly.
The important insight is that he is moving up a level of abstraction. He is not only asking agents to write code. He is asking agents to maintain workflows over time.
That is the same shift Codex needs to own.
Codex should not only answer:
Can you fix this bug?
It should answer:
Can you keep this PR moving until it is either merged or blocked by a human decision?
That second question is much more valuable.
Here is the content and product thesis:
Codex wins when it becomes the loop manager for engineering work.
Not just the model that writes the code. Not just the CLI that edits files. The system that can:
That is the difference between agent assistance and agent operations.
The next Codex content cluster should cover:
That cluster is more useful than another generic "what is Codex" post because it meets teams where they are: trying to turn agent output into shipped, reviewed, production-safe work.
Boris's loop-heavy workflow is a preview of where agentic coding is going. The headline is not "engineers will manage thousands of agents." The headline is smaller and more practical:
Recurring engineering work is about to become agent-managed.
The winning teams will not be the ones with the most agents. They will be the ones with the clearest loop contracts.
For Codex, that is the content lane to own: how to design, run, verify, and stop the loops that keep software moving.
Agent loops are recurring AI workflows that inspect state, decide whether action is needed, act within a defined scope, and report results. They are useful for PR babysitting, CI repair, deploy verification, feedback clustering, and other changing-state engineering work.
A cron job runs a fixed command on a schedule. An agent loop runs a recurring decision process: inspect the current state, choose an action, apply bounded changes, verify, and escalate if needed.
Codex has the right surfaces for loops: CLI for local work, GitHub Action for repo events, automations for recurring checks, goals for longer-running objectives, and browser verification for production checks. The missing part is a clear loop contract.
Start with a read-only PR review loop. Have Codex inspect pull requests with a label, summarize CI and review status, and post a concise comment. Add write access only after the signal is consistently useful.
Sources: Boris Cherny interview on YouTube, OpenAI Codex CLI docs, OpenAI Codex SDK docs, openai/codex-action README, OpenAI Codex changelog.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's cloud coding agent. Runs in a sandboxed container, reads your repo, executes tasks, and submits PRs. Uses GPT-5...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View ToolOpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View ToolOne control panel for Claude Code, Codex, Gemini, Cursor, and 10+ AI coding harnesses. Desktop app for Mac.
Open AppSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
Open AppAI app generator. Describe what you want and get a working app in minutes.
Open AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsDeep comparison of the top AI agent frameworks - architecture, code examples, strengths, weaknesses, and when to use each one.
AI Agents
Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries....

A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, contro...

Codex is no longer just a terminal agent. Here is when to use the Codex SDK, Codex CLI, or openai/codex-action, and how...

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

The math of agent pipelines is brutal. 85% reliability per step compounds to about 20% at 10 steps. Here is why long cha...

OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, auto...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.