
TL;DR
Aharness, LangChain's custom harness pattern, and OpenAI's code-first migration all point to the same next step: agent processes need typed gates, validated evidence, and controlled transitions.
Prompts can describe a workflow. They cannot enforce one.
That is the sharp lesson from the latest agent tooling wave. OpenAI is moving production agent work away from hosted visual surfaces and toward the Agents SDK. LangChain is writing about custom harnesses as the scaffolding around the model. Aharness, a new Codex-focused project on GitHub and Hacker News, makes the argument more explicit: encode coding-agent workflows as finite state machines with typed gates, validated evidence, controlled transitions, repair paths, and inspectable logs.
That is the right direction.
The next useful abstraction is not a longer prompt. It is a workflow runtime the agent cannot casually ignore.
Last updated: June 23, 2026
The fresh signal is Aharness, described as a workflow harness for Codex. Its pitch is narrow and practical: agent workflows should be finite state machines written in TypeScript, with states that define what Codex may do next and transitions that require validated exits.
The Show HN thread frames the problem directly: models are capable enough for longer autonomous work, but process drift and context management are now the failure modes. Prompts and skills describe the process; they do not enforce it.
That is the distinction worth writing down.
It also fits the larger context from the last few days:
The direction is consistent: serious agent workflows are becoming software artifacts.
A prompt checklist can say:
Plan first.
Only edit the requested files.
Run tests.
Attach evidence.
Stop if tests fail twice.
Ask before risky changes.
That is better than nothing. It is also easy for an agent to forget, reinterpret, or satisfy with a weak summary.
A workflow runtime can enforce:
That is a different class of control.
This is the same reason long-running agents need harnesses. The model can do more work now. The surrounding system has to decide what counts as a valid move.
A finite state machine sounds academic until you map it to a coding-agent run.
| State | Allowed exits | Evidence required |
|---|---|---|
| intake | accept task, request clarification, reject task | task contract |
| plan | approve plan, revise plan | scoped plan and file boundaries |
| implement | move to verify, request tool approval | diff summary |
| verify | pass, fail, repair | command output and failing logs |
| repair | retry verify, escalate, stop | fix attempt and retry count |
| final | close run | receipt with changes, checks, risks |
That is already how good human-led agent sessions work. The difference is whether the structure lives in the operator's head or in code.
State machines give agent runs three useful properties:
Controlled transitions. The agent can only move to states the workflow exposes. If there is no direct path from intake to final, the agent cannot skip planning and verification by writing a confident closeout.
Typed submissions. Each state can require a specific shape of evidence: a plan object, a file list, a command transcript, a test result, or a risk note. Natural language becomes input to a verifier, not the verifier itself.
Repair paths. Failure can be part of the workflow instead of an exception. A failed test can move the run to repair with a retry budget, or to escalation if the same failure repeats.
That makes the workflow inspectable after the fact. You can ask where the run stalled, which gate failed, which evidence was missing, and whether the agent followed the process.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 23, 2026 • 8 min read
Jun 23, 2026 • 9 min read
Jun 23, 2026 • 7 min read
Jun 23, 2026 • 7 min read
This is not an argument against skills.
Skills are useful because they package operating knowledge. A good skill can teach an agent how your team debugs flaky tests, writes release notes, reviews migrations, or handles UI QA. That is why skills beat prompts for repeatable work.
But a skill is still mostly instruction. It tells the agent what good looks like.
A workflow runtime tells the agent what moves are allowed.
You want both:
AGENTS.md for repo context;That is the stack the post-visual-builder world is converging on.
LangChain's custom harness post uses different language, but the problem is similar. The post defines a harness as scaffolding around the model that connects it to the real world. It specifically calls out middleware for retries, fallbacks, policy enforcement, PII handling, approval gates, steering, cost limits, and prompt caching.
That is harness thinking.
The useful part is "task-harness fit." A customer service agent, coding agent, data agent, and legal review agent should not share one generic runtime. They need different gates, tools, logs, and failure paths.
State machines are one way to make that fit explicit. Middleware is another. LangGraph is another. The common point is that process moves out of invisible prompt wording and into something engineers can inspect.
This is where agent eval receipts matter. Once the workflow is code, you can compare versions:
Those are answerable questions.
The strongest objection is also correct: not every agent task should be a state machine.
Some work is exploratory. Research, debugging, discovery, architecture search, and incident response often start without a known path. If you force those into a rigid workflow too early, you get process theater: the agent fills boxes instead of thinking.
That is the risk.
The answer is not to wrap everything in a finite state machine. The answer is to encode the parts of the workflow that should not be ambiguous.
Good candidates:
Bad candidates:
Use dynamic agent behavior for discovery. Use state machines for commitments.
If I were turning a prompt checklist into an agent workflow, I would start with four files:
workflows/
bugfix.fsm.ts
bugfix.schema.ts
bugfix.evals.jsonl
README.md
The finite state machine owns the legal transitions. The schema file owns typed submissions. The eval file owns representative tasks. The README explains when to use the workflow and when not to.
For teams that already version prompts, this should feel like the next step after Prompt Versioning with Promptlock. Prompt diffs show what instructions changed. Workflow diffs show what the agent is allowed to do with those instructions.
The key gates:
| Gate | What it prevents |
|---|---|
| accepted task contract | vague work entering the run |
| scoped plan | broad diffs before agreement |
| declared file list | silent ownership expansion |
| verification output | fake "tests passed" summaries |
| bounded repair loop | endless retry token burn |
| final receipt | unreviewable closeouts |
This is not heavy process. It is the minimum scaffolding that keeps a capable agent from wandering.
The interesting race is not whether Aharness specifically wins. It is whether the pattern spreads.
Watch for:
That last point matters. The durable artifact should not be "a prompt that works in one chat app." It should be a workflow definition that survives model and UI churn.
The agent ecosystem is slowly relearning a very old software lesson: if a process matters, put it in code.
It means encoding the agent process in versioned software artifacts instead of relying only on natural language prompts. The workflow can define states, allowed transitions, evidence requirements, retry limits, tool policies, and final receipts.
State machines make the run inspectable and enforceable. They prevent agents from skipping required stages, require evidence before transitions, and make failures route through defined repair or escalation paths.
No. Skills package operating knowledge and reusable instructions. Workflows as code enforce the process around the skill: when it runs, what evidence it must produce, what transitions are allowed, and when the run stops.
Avoid rigid workflows for early exploration, open-ended research, and ambiguous debugging. Use them when the process is known and the cost of skipping steps is high: releases, migrations, security triage, code review receipts, eval replay, and deploy checks.
Aharness is currently framed around Codex workflows, but the broader idea is not Codex-specific. Any coding-agent stack can benefit from typed gates, controlled transitions, repair paths, and inspectable evidence.
Read next
OpenAI's June deprecations put Agent Builder, hosted Evals, and reusable prompts on a November 30 shutdown path. Here is the practical migration plan: Agents SDK, repo-owned prompts, and eval receipts.
8 min readA long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state, verify behavior, limit cost, and recover from failure.
8 min readHex's data-agent lab shows the practical eval pattern AI teams should copy: compare candidates against stable baselines, keep receipts, and judge changes by task behavior.
8 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's coding agent for terminal, cloud, IDE, GitHub, Slack, and Linear workflows. Reads repos, edits files, runs comm...
View ToolFrontend stack for agent-native apps. React hooks, prebuilt copilot UI, AG-UI runtime, frontend tools, shared state, and...
View ToolMost popular LLM framework. 100K+ GitHub stars. Chains, RAG, vector stores, tool use. LangGraph adds stateful multi-agen...
View ToolTypeScript-first AI agent framework. Agents, tools, memory, workflows, RAG, evals, tracing, MCP, and production deployme...
View ToolSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
View AppEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppTurn a one-liner into a working Claude Code skill. From idea to installed in a minute.
View AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI Agents
OpenAI's June deprecations put Agent Builder, hosted Evals, and reusable prompts on a November 30 shutdown path. Here is...

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

Hex's data-agent lab shows the practical eval pattern AI teams should copy: compare candidates against stable baselines,...

A long-form technical read on Flue from Fred K Schott, with deeper comparisons against OpenAI Agents, Vercel AI SDK, Goo...

The coding-agent workflow is maturing past giant hand-written prompts. The winning pattern in 2026 is a control stack: p...

AI agents are getting their own computers. Here is how to choose a sandbox architecture: filesystem isolation, network p...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.