TL;DR
Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager for real software work.
Direct answer
Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager for real software work.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Read next
Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs, CI, deploys, and feedback streams.
8 min readA deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.
18 min readCodex is no longer just a terminal agent. Here is when to use the Codex SDK, Codex CLI, or openai/codex-action, and how to avoid building the same agent loop three times.
8 min readAndrej Karpathy's "loopy era" interview with No Priors is one of the better explanations of the current AI coding shift because it does not frame the change as better autocomplete.
The useful claim is sharper: the agent is now assumed. The new skill is designing loops that keep useful work moving without a human prompting every next step.
That is exactly the lens I would use for Codex. If you still think of OpenAI Codex as "a model that writes code," you will underuse it. The more interesting version is Codex as a control surface for agentic engineering: task specs, repo rules, parallel sessions, objective checks, budgets, escalation, and production verification.
This also connects cleanly to Boris Cherny's loop-heavy workflow. Boris's /loop framing is about recurring engineering chores. Karpathy's loopy era is the larger principle underneath it: remove yourself from the prompt-next-step loop when the task has enough structure to run.
For the existing Codex cluster, read this alongside Codex loops and Boris Cherny, Codex /goal vs Claude managed outcomes, and Codex SDK vs CLI vs GitHub Action. They are all pointing at the same workflow shape.
In the No Priors interview, Karpathy describes a personal workflow that moved from mostly hand-written code to mostly agent delegation. The important part is not the percentage. It is the unit of work.
He is not talking about:
He is talking about moving in macro actions over a repository. One agent researches. Another writes code. Another plans. Another explores a separate implementation path. The human steers, reviews, and designs the system around the agents.
That is the jump from "vibe coding" to agentic engineering. The developer is less like a typist and more like an operator of parallel technical loops.
This is also why AI coding tool comparisons that only score code generation miss the next decision point. The question is not just which model writes the best React component. It is which environment lets you safely run more useful loops.
Karpathy's AutoResearch example is so useful because it has the ingredients that make loops work:
objective + metric + boundary + worker loop + result review
He describes setting up a research loop where agents try experiments, evaluate objective metrics, and continue without waiting for him to inspect every intermediate result. The goal is to maximize useful token throughput while removing the human as the bottleneck.
That sounds abstract until you map it to software:
| AutoResearch primitive | Software engineering version |
|---|---|
| Objective | Improve this benchmark, fix this failing path, reduce this latency |
| Metric | Test pass rate, benchmark score, bundle size, route 200, typecheck |
| Boundary | Files in scope, commands allowed, time budget, permission model |
| Worker loop | Codex task, GitHub Action, CLI session, automation |
| Result review | PR diff, logs, eval report, deploy check, human approval |
This is why Codex is interesting right now. It already lives close to the software loop. It can read repo instructions, edit files, run commands, review diffs, and report what changed. With the Codex GitHub Action, the loop can also be attached to pull request events. With Codex automations, the same pattern can become recurring work instead of one-off delegation.
The point is not that Codex magically solves engineering. The point is that Codex is one of the more natural places to formalize the loop.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
May 5, 2026 • 7 min read
May 4, 2026 • 7 min read
May 4, 2026 • 7 min read
May 4, 2026 • 6 min read
The weak version of agentic engineering is:
Make the app better.
The stronger version is:
goal: "Reduce checkout route cold-start time by 20 percent"
scope:
include:
- app/checkout/**
- lib/payments/**
exclude:
- migrations/**
- auth/**
metric:
command: "pnpm bench checkout"
success: "p95 improves by at least 20 percent and tests pass"
budget:
max_runtime_minutes: 40
max_files_changed: 8
max_attempts: 2
stop:
- metric_cannot_be_reproduced
- same_failure_twice
- needs_product_decision
report:
include:
- changed_files
- commands_run
- before_after_metric
- remaining_risks
That contract is the practical translation of Karpathy's loopy era into Codex work.
It gives the agent enough room to continue. It gives the human enough structure to review. It gives the workflow a stopping point. Most importantly, it makes the loop portable. The same contract can start in the Codex CLI, move into GitHub Actions, and eventually become a productized workflow through an SDK.
This is the real content lane for Codex: not "here is a clever prompt," but "here is the smallest reliable loop contract for a real engineering job."
Codex has three especially useful roles in this loopy model.
The local loop is still human-steered. You run Codex from a repo, give it a narrow target, inspect the diff, and decide what happens next.
This is where Codex competes with Claude Code, Aider, Cursor agents, and other terminal or IDE coding tools. It is also where the loop contract can stay lightweight:
Fix the failing tests in lib/billing.
Only touch lib/billing and tests/billing.
Run pnpm test billing and pnpm typecheck.
Stop after one implementation path if the failure is ambiguous.
The local loop is best for high-context work where the developer is actively supervising. It is not the highest-leverage loop, but it is the safest place to learn how Codex behaves in your repo.
The GitHub loop is event-driven. A PR opens. A label is added. CI fails. A nightly schedule fires. Codex comments, reviews, drafts a patch, or produces an artifact.
This is where the Codex GitHub Action becomes more than a convenience wrapper. GitHub already has the state machine:
Codex can sit inside that state machine if the permissions are narrow and the output is inspectable. Start read-only. Let it summarize failures, review diffs, and propose next actions. Only widen write access after the comments are consistently useful.
That is the difference between agent automation and an overpowered CI job.
The recurring loop is the closest to Karpathy's point. It does not wait for a human prompt. It wakes up, refreshes state, checks whether useful work exists, acts inside a boundary, and reports.
Examples:
codex-watch label;main changes;This is also where the long-running agent harness matters. A recurring loop without receipts is just an expensive cron job with model access. A recurring loop with logs, budgets, stop conditions, and escalation is an engineering system.
The skeptical view is not "agents are useless." The better skeptical view is that many loops are fake autonomy.
Karpathy says the caveat clearly: this works best when the objective metric is easy to evaluate. If you cannot evaluate the result, you cannot safely automate the loop.
That is a major limitation.
Codex loops are good at:
Codex loops are weaker at:
This is why debugging agent workflows and agent architecture are not side topics. They are the infrastructure around the loop. Once the agent can continue without you, failures become harder to see and more expensive to ignore.
If I were setting up a Codex-heavy repo after watching the Karpathy interview, I would do five things.
AGENTS.md Like a Runtime ContractDo not treat repo instructions as polite preferences. Treat them as the first layer of the loop contract.
Include:
For a deeper version of that, see the Codex macOS certificate runbook. The useful part is not the certificate topic. It is the operational shape: exact commands, exact checks, and exact recovery paths.
Create a codex-tasks/ folder with reusable loop contracts:
codex-tasks/
fix-ci.yml
verify-deploy.yml
review-pr.yml
update-blog-seo.yml
refresh-docs.yml
Each file should name the trigger, scope, verification command, budget, stop conditions, and report format.
This is how you move from improvisation to repeatability. It also makes Codex easier to compare against Claude Code or Cursor because you are comparing the same task contract, not vibes.
Karpathy's macro-action point only works when tasks do not collide.
Good split:
app/billing/**;tests/billing/**;Bad split:
Parallel agents multiply throughput only when ownership is explicit. Otherwise they multiply merge conflicts and review load.
The best loop metrics are not fancy:
pnpm typecheck passes;pnpm test billing passes;200;This is why Codex is a good fit for engineering loops. Software has many cheap objective checks. Use them before asking the model to judge its own work.
The loop should stop sooner than your ego wants.
Stop when:
This is the part many agent demos skip. The future is not an agent that never asks for help. The future is an agent that knows exactly when it has crossed from execution into judgment.
Karpathy's loopy era is not a slogan about agents getting smarter. It is a workflow claim:
The leverage comes from arranging work so agents can continue against metrics and boundaries while humans stop being the next-step bottleneck.
Codex makes that concrete for software teams. The best Codex workflows will not be the longest prompts. They will be the cleanest loops:
That is how Codex moves from "AI coding tool" to agentic engineering infrastructure.
openai/codex-action repository: https://github.com/openai/codex-actionTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's cloud coding agent. Runs in a sandboxed container, reads your repo, executes tasks, and submits PRs. Uses GPT-5...
View ToolOpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View ToolOpenAI's open-source terminal coding agent built in Rust. Runs locally, reads your repo, edits files, and executes comma...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View Tool
Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs,...

A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, contro...

Codex is no longer just a terminal agent. Here is when to use the Codex SDK, Codex CLI, or openai/codex-action, and how...

Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries....

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, auto...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.