OpenAI Agent Builder and Evals Are Shutting Down: Move the Agent Stack Into Code

OpenAI just made the agent-builder lesson explicit: production agent workflows need to live closer to code.

The official deprecations page now lists three June 3, 2026 deprecations that agent teams should not ignore:

Agent Builder is scheduled to shut down on November 30, 2026.
The Evals platform becomes read-only on October 31, 2026 and is scheduled to shut down on November 30, 2026.
Reusable prompt objects and the v1/prompts API are scheduled to shut down on November 30, 2026.

ChatKit remains available, and OpenAI points Agent Builder users toward the Agents SDK or ChatGPT Workspace Agents. For builders, the direction is clear: visual builders are useful for exploration, but the durable production surface is code, versioned prompts, and eval runs you can replay.

That does not make visual builders useless. It does mean you should not let your production agent logic live only in a hosted canvas.

Last updated: June 23, 2026

What Is Actually Changing

OpenAI's deprecations page is the source to read first. The relevant timeline:

Surface	Deprecation announced	Read-only date	Shutdown date	Migration direction
Agent Builder	June 3, 2026	Not listed	November 30, 2026	Agents SDK or ChatGPT Workspace Agents
Evals platform	June 3, 2026	October 31, 2026	November 30, 2026	Promptfoo or repo-owned eval workflows
Reusable prompts API	June 3, 2026	Not listed	November 30, 2026	Move prompt content into application code

The Evals docs repeat the same point: the hosted Evals platform is being deprecated, existing eval content stays available during the transition window, and teams should look at alternatives if they are new to evaluations or want a more iterative environment.

This is not only a product cleanup. It changes the advice I would give any team building agents on OpenAI.

For the original builder-side take, read OpenAI AgentKit in Production. This post is the migration follow-up.

The Take: Treat Hosted Builders as Prototyping Surfaces

The lesson is not "never use visual tools."

The lesson is: do not let the only copy of your agent logic live in a hosted visual tool.

Production agents need the same boring properties as production code:

version control;
code review;
typed configuration;
test fixtures;
reproducible eval runs;
deploy history;
rollback paths;
ownership in the repo.

Hosted builders can accelerate discovery. They are great when a PM, designer, support lead, or ops teammate needs to see the shape of a workflow. They are less durable when they become the sole source of truth for branching logic, prompts, tool permissions, or evaluation criteria.

That is why this deprecation matters. It pushes the ecosystem toward a healthier split:

Use visual surfaces to explore and communicate.
Use code for the production loop.
Use repo-owned eval receipts to decide whether changes ship.

Migration Step 1: Inventory the Hidden Agent State

Before moving anything, list the state that currently lives outside the repo.

For Agent Builder, that usually means:

node graph structure;
prompt text inside nodes;
tool configuration;
branch conditions;
approval steps;
connector scopes;
published versions;
run traces.

For Evals, it means:

eval definitions;
graders and rubrics;
datasets;
baseline runs;
score thresholds;
dashboard notes;
failure examples.

For reusable prompt objects, it means:

prompt names and IDs;
prompt content;
template variables;
version history;
call sites that reference prompt IDs.

Treat this like an API migration, not a copy-paste exercise. If a production service calls a prompt object by ID, the migration is not finished until that service reads a versioned prompt from code or config and has a rollback path.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

OpenAI Daybreak Shows the AppSec Bottleneck Is Patching, Not Finding

Jun 23, 2026 • 8 min read

OpenMontage Shows the Real Future of AI Video: Agents, Not Editors

Jun 23, 2026 • 7 min read

Prompt Injection Is Really Role Confusion

Jun 23, 2026 • 8 min read

TikZ Editor Is a WYSIWYG LaTeX Figure Tool Built Almost Entirely by Codex

Jun 23, 2026 • 7 min read

Migration Step 2: Move Prompts Into the Repo

OpenAI's deprecation guidance for reusable prompt objects is blunt: move reusable prompt content into your application code.

That does not mean scattering giant strings through handlers.

Use a small repo convention:

agents/
  support/
    agent.ts
    prompts/
      system.md
      escalation.md
    evals/
      fixtures.jsonl
      rubric.md

The important part is not the exact folder name. The important part is that prompts get reviewed with the code that depends on them.

Good prompt files should include:

what the agent is allowed to do;
what tools it may call;
what it must never claim;
what evidence it must preserve;
which eval fixtures protect the behavior.

That pairs naturally with OpenAI Agents SDK for TypeScript, where agent definitions, tools, handoffs, guardrails, and structured outputs already live in code.

Migration Step 3: Rebuild Builder Flows as Explicit Agent Loops

The Agents SDK is the obvious destination when the workflow is owned by engineers.

The current SDK docs emphasize a few primitives that map well from visual builders:

Builder concept	Code-first replacement
node	function, tool, agent step, or handoff
branch	normal control flow or guardrail
human approval	human-in-the-loop checkpoint
connector	MCP server, hosted tool, or typed integration
visual run trace	SDK tracing and saved receipts
workflow version	git commit and deployment version

This is not always a one-to-one migration. A visual canvas often has too many tiny nodes because each node is easy to add. Code lets you collapse low-value nodes into one function and expose only the actual decision points.

The migration rule:

Keep branch decisions explicit.
Batch mechanical steps into code.
Preserve approval gates.
Preserve tool permission boundaries.
Preserve trace receipts.

For the SDK-side architecture details, read Agents SDK Evolution and Managed Agents vs LangGraph vs DIY.

Migration Step 4: Move Evals From Dashboard to Receipts

The Evals platform deprecation is the more important deadline for serious teams.

An agent without evals is just a workflow you hope still works.

When you move evals out of the hosted dashboard, do not only move the final score. Move the evidence. A useful eval receipt should include:

Receipt field	Why it matters
fixture ID	ties the run to a stable test case
baseline version	prevents comparing against a moving target
candidate version	maps behavior to a branch or commit
model and tool config	explains why behavior changed
inputs and expected behavior	keeps the task reviewable
run trace	shows tool calls, retries, and decisions
score and rubric notes	separates correctness, safety, cost, and style
cost and latency	prevents expensive "wins" from hiding

That is the point of baseline receipts. You are not trying to recreate a pretty dashboard first. You are trying to preserve enough evidence that a developer can replay the important claim.

OpenAI's docs point to Promptfoo as one migration path. That is reasonable if your evals are prompt and output focused. If your agent uses tools, files, browsers, sandboxes, or multi-step state, you may need a custom harness around the SDK so the eval can capture the whole run.

Migration Step 5: Choose Workspace Agents Only for the Right Jobs

OpenAI's deprecations page says Agent Builder users can continue with the Agents SDK or ChatGPT Workspace Agents.

That split matters.

Use code-first Agents SDK when:

the workflow ships inside your product;
the agent touches customer data;
you need CI, tests, and deployment history;
evals must run in your repo;
tool permissions are part of your security model.

Use Workspace Agents when:

the workflow is mostly internal;
natural-language editing by non-engineers matters;
the output is advisory or draft-like;
the risk is bounded by human review;
you want the agent available inside ChatGPT workspaces.

The mistake is treating those as interchangeable. They are not. One is a developer runtime. The other is a workspace automation surface.

The Opposing Take: Visual Builders Still Matter

There is a fair counterargument: code-first systems exclude the people who understand the process.

Support leaders, product managers, sales engineers, data analysts, and operations teams often know the workflow better than the engineer implementing it. A visual builder gives them a shared artifact. A TypeScript file does not.

That is why the answer is not "delete the canvas."

The better pattern is dual-surface:

diagrams and visual flows for design reviews;
code for production execution;
eval receipts for behavior changes;
ChatKit or workspace surfaces for human interaction;
PR review for prompt and tool changes.

The visual artifact explains the workflow. The repo owns the workflow.

That distinction becomes more important as agent systems grow. A diagram can show intent. Code and evals prove what actually runs.

A Five-Day Migration Plan

Use the deprecation clock to force discipline:

Day 1: Inventory. Export every Agent Builder flow, hosted eval, prompt object, and service call that references those IDs.

Day 2: Freeze baselines. Save representative successful and failed runs before changing anything. Capture inputs, outputs, tool calls, cost, latency, and human notes.

Day 3: Move prompts. Put prompts in the repo with owners, review rules, and version history. Replace prompt-ID lookups with file or config loading.

Day 4: Rebuild the loop. Implement the agent in the Agents SDK, LangGraph, or your own loop. Preserve approval gates and tool boundaries first, then optimize.

Day 5: Replay evals. Run the old baseline against the new implementation. Do not ship until the candidate beats or matches baseline behavior on the cases that matter.

That is the practical standard. Not "we copied the graph." The standard is "we can prove the migrated agent behaves at least as well as the old one."

FAQ

Is OpenAI Agent Builder shutting down?

Yes. OpenAI's deprecations page says Agent Builder deprecation was announced on June 3, 2026, and Agent Builder is scheduled to shut down on November 30, 2026. ChatKit remains available.

Is OpenAI Evals shutting down?

The hosted Evals platform is being deprecated. OpenAI's docs say existing evals become read-only on October 31, 2026, and the platform is scheduled to shut down on November 30, 2026.

What should replace Agent Builder?

For product-owned workflows, move the production loop into the OpenAI Agents SDK, LangGraph, or a repo-owned agent loop. For internal workspace automations where non-engineer editing matters, evaluate ChatGPT Workspace Agents.

What should replace reusable prompt objects?

Move reusable prompt content into application code or repo-managed prompt files. Keep prompts versioned, reviewed, and tied to the eval fixtures that protect their behavior.

Does this mean visual agent builders are dead?

No. Visual builders remain useful for prototyping, design reviews, and non-engineer collaboration. The change is where production truth should live: in code, reviewed prompts, deploy history, and replayable eval receipts.

What Is Actually Changing

The Take: Treat Hosted Builders as Prototyping Surfaces

Migration Step 1: Inventory the Hidden Agent State

OpenAI Daybreak Shows the AppSec Bottleneck Is Patching, Not Finding

OpenMontage Shows the Real Future of AI Video: Agents, Not Editors

Prompt Injection Is Really Role Confusion

TikZ Editor Is a WYSIWYG LaTeX Figure Tool Built Almost Entirely by Codex

Migration Step 2: Move Prompts Into the Repo

Migration Step 3: Rebuild Builder Flows as Explicit Agent Loops

Migration Step 4: Move Evals From Dashboard to Receipts

Migration Step 5: Choose Workspace Agents Only for the Right Jobs

The Opposing Take: Visual Builders Still Matter

A Five-Day Migration Plan