
TL;DR
OpenAI is sunsetting the Assistants API in 2026. Here is a tested migration plan to the Responses API — code, state, threads, tools, every cliff I hit, in order.
OpenAI confirmed the Assistants API sunset in the developer changelog: new endpoints frozen now, full shutdown in 2026. Threads, runs, run-steps, and the assistant resource itself all go away. Files and vector stores survive (they moved into the Responses API surface). Function calling survives but the schema is slightly different. The Code Interpreter and File Search tools survive as built-in tools on Responses.
If you are running production code against client.beta.threads.* today, you have homework. I had a 14-month-old Assistants codebase running newsletter automation, customer support triage, and a chunk of internal ops. Last weekend I migrated all of it. This is the field guide — every cliff I hit, in order, with the code diffs that worked.
For the visual walkthrough including the eval harness I used to gate the cutover, see the DevDigest YouTube channel.
The Assistants API was server-stateful. You created a thread, posted messages, kicked off runs, polled for completion, and OpenAI held the conversation history. Your code did not own the state.
The Responses API is client-stateful by default, server-stateful by opt-in. Each call returns a response.id. You pass previous_response_id on the next call to get continuity. The server stores the chain for 30 days. After that, you reconstruct from your own DB or pass the message array explicitly.
This is the right design — server-only state was a footgun for compliance, debugging, and multi-region — but it changes how you think about every conversation:
| Assistants | Responses |
|---|---|
threads.create() | nothing — just call responses.create |
threads.messages.create() | include in input array |
runs.create() + poll | responses.create() returns synchronously or streams |
run.required_action | response.required_action (similar but flatter) |
assistants.create() | prompts + system messages + tools per call |
The big mental shift: there is no assistant object anymore. The "assistant" is your prompt template + tool list + model config, which you supply per call. This is why I version mine in Promptlock — the prompt is now a first-class artifact in your repo, not a row in OpenAI's database.
Here is the minimal-diff before/after for a single conversation turn. The "before" is the standard Assistants pattern most of us wrote in 2024:
// BEFORE — Assistants API
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
role: "user",
content: userMessage,
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: ASSISTANT_ID,
});
const messages = await client.beta.threads.messages.list(thread.id);
const reply = messages.data[0].content[0].text.value;
// AFTER — Responses API
const response = await client.responses.create({
model: "gpt-5.5",
instructions: SYSTEM_PROMPT,
input: userMessage,
tools: TOOLS,
previous_response_id: priorResponseId, // null on first turn
store: true, // 30-day server retention
});
const reply = response.output_text;
const newResponseId = response.id; // persist for next turn
The "after" version is shorter, synchronous on the happy path, and the conversation chain lives in two places you control: your DB row (the response.id) and your prompt repo (SYSTEM_PROMPT).
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
This is where I lost the most time. Three patterns I now use:
Pattern 1: Short-lived chains (default). Persist previous_response_id against your conversation row. On each turn, pass it. Trust OpenAI's 30-day retention. This is what most apps want.
await db.conversation.update({
where: { id: convId },
data: { lastResponseId: response.id },
});
Pattern 2: Long-lived or compliance-bound chains. Do not rely on server retention. Store every message in your DB and pass them explicitly:
const response = await client.responses.create({
model: "gpt-5.5",
instructions: SYSTEM_PROMPT,
input: messages.map((m) => ({ role: m.role, content: m.content })),
store: false, // do not retain server-side
});
Pattern 3: Hybrid. Short-lived state via previous_response_id, but you also write every input/output to your DB for replay and eval purposes. This is what I run in production. It is the only pattern that gives you both ergonomic continuity and full-control debugging.
The cliff I hit: I assumed previous_response_id would still work after 31 days. It does not — the server returns a 404. Wrap every call in a fallback that reconstructs from your DB if the chain is missing.
Function calling works, with a flatter schema. The tools array is the same shape. The big differences:
code_interpreter and file_search are now first-class tools you enable per call. No more attaching them to an assistant.requires_action.Here is the parallel-tool gotcha. In Assistants, this code was safe:
// Assistants — implicit serial
for (const call of run.required_action.submit_tool_outputs.tool_calls) {
const output = await runTool(call); // safe, one at a time
}
In Responses, the model now expects you to handle multiple tool calls concurrently. If runTool is not idempotent or hits a rate-limited downstream, batch your calls or Promise.all them with a concurrency cap:
import pLimit from "p-limit";
const limit = pLimit(3);
const outputs = await Promise.all(
response.required_action.submit_tool_outputs.tool_calls.map((call) =>
limit(() => runTool(call))
)
);
I missed this on my first migration. The customer-support agent fired four parallel ticket-update calls to a legacy CRM and got rate-limited into oblivion within an hour.
The migration is mechanical but the behavior is not always identical. Different default temperatures, different tool-call patterns, different message-formatting quirks. I would not cut over without a regression eval.
My harness: a flag-gated rollout where 10% of traffic goes to Responses, 90% to Assistants, both runs are logged with the same input, and a nightly job scores the diffs. I open-sourced the bones of this as Agent Eval Bench — input replay, output diff, automated grading via a stronger model.
The cutover schedule that worked for me:
@deprecated comments for one more month, then delete.Burn-down looked roughly like this in my logs:
Day 1: 47 endpoints calling Assistants
Day 7: 47 (built path, no traffic yet)
Day 9: 47 → 47 (10% rollout, both alive)
Day 14: 47 → 12 (cut the safe ones, kept stateful chains on assistants)
Day 21: 12 → 3 (long-lived chain edge cases)
Day 28: 0
The last three were the long-lived stateful chains where I needed pattern 2 above (explicit history). They took longer because I had to backfill DB writes for conversations that had been server-stateful for months.
Three things in priority order:
runTool implementations for shared mutable state. The parallel-by-default behavior will find every race condition you have.OpenAI gave us through 2026, which sounds generous until you remember every other library you depend on is also moving. Do not be the one team migrating in October.
The Responses API is the better primitive. It is simpler, more honest about state, and the streaming model finally feels native. The migration is a weekend of work for a small codebase and two weeks for a complex one. Worth it.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Factory AI's terminal coding agent. Runs Anthropic and OpenAI models in one subscription. Handles full tasks end-to-end...
View ToolThe TypeScript toolkit for building AI apps. Unified API across OpenAI, Anthropic, Google. Streaming, tool calling, stru...
View ToolUnified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolOpenAI's latest flagship model. Major leap in reasoning, coding, and instruction following over GPT-4o. Powers ChatGPT P...
View ToolConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsSet up Codex Chronicle on macOS, manage permissions, and understand privacy, security, and troubleshooting.
Getting Started
AgentKit gives you Agent Builder, Connector Registry, and ChatKit. I rebuilt my newsletter-research agent on it. Here is...

What it actually takes to wire OpenAI Symphony into a Linear-driven Codex workflow — auth, runs, sandboxes, costs, and t...

We ran the same Convex to Neon migration on four apps in a week. Here is what stayed identical, what differed per app, a...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.