12-Factor Agents: Production Principles for Reliable AI Agents

Last updated: June 24, 2026

The Trend Was Real. The Checklist Is The Asset.

The older posts on this site covered humanlayer/12-factor-agents as a fast-rising GitHub Trending repo. The dated star-count hook is stale. The production checklist is still useful.

As of this refresh, the GitHub API shows humanlayer/12-factor-agents at roughly 23.5k stars, 1.8k forks, TypeScript-first, and last pushed in September 2025. The README describes it as principles for building reliable LLM applications, inspired by the original 12-Factor App methodology.

The repo is not a framework, CLI, or package. It is a design guide by Dex Horthy and HumanLayer for building LLM-powered software that is good enough for production customers.

That framing is why it still matters. Agent teams do not need one more magic loop. They need a language for review: prompts, context, tools, state, lifecycle, humans, control flow, errors, scope, triggers, and reducers.

This belongs next to what an AI coding agent is, context engineering, long-running agents needing harnesses, debugging AI agent workflows, and permissions, logs, and rollback for AI coding agents. The model is not the product. The system around the model is.

What The 12 Factors Actually Say

The current README lists the twelve factors as a short version:

Natural Language to Tool Calls
Own your prompts
Own your context window
Tools are just structured outputs
Unify execution state and business state
Launch/Pause/Resume with simple APIs
Contact humans with tool calls
Own your control flow
Compact Errors into Context Window
Small, Focused Agents
Trigger from anywhere, meet users where they are
Make your agent a stateless reducer

Abstract systems illustration for 12-factor agent production checklist

The useful theme is not the number twelve. It is ownership.

Own the prompt. Own the context. Own the control flow. Own the lifecycle. Own the state boundary. Own how humans enter the loop.

That is the opposite of many agent demos, where a framework hides the prompt, grows the context window, stores hidden state, and calls the result "autonomy."

The Best Way To Use It

Use 12-Factor Agents as a review checklist.

Before shipping an agent feature, ask:

Where is the prompt versioned?
What exactly enters the context window?
Which tool calls can mutate external state?
Where does execution state live?
Can the run pause and resume?
How does a human approve, reject, or correct a step?
Which control flow is deterministic code and which part is model choice?
How are errors compacted back into context?
Is the agent narrow enough to debug?
Which surfaces can trigger it?
Can we replay or reduce a run into state plus event?

Those questions pair naturally with tool-use production patterns, agent workflows as state machines, agent replays, and agent security before connecting tools. The checklist is not about rejecting tools. It is about refusing hidden behavior where review needs explicit behavior.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

AI Security Scanners Move the Bottleneck to Triage

May 23, 2026 • 8 min read

Models.dev Makes Model Routing Feel Like Infrastructure

May 23, 2026 • 7 min read

Multi-Stream LLMs Hint at the Next Agent Architecture

May 23, 2026 • 8 min read

Claude Code's Official Plugin Marketplace Is Here - and It's Already at 23k Stars

May 22, 2026 • 5 min read

The Factors That Matter Most In Practice

All twelve are useful, but four carry most of the production load.

Own your context window. Context is now the system boundary. If you let every log line, memory, tool result, and retrieved document into the prompt, you get a slower and less reliable agent. Context should be curated, source-linked, compacted, and reviewable.

Own your control flow. The model can choose the next action, but the application should own the loop, stop conditions, retries, approval gates, and budget ceilings. This is the difference between a product and a runaway process.

Contact humans with tool calls. Human approval should not be a side channel. If a human decision changes the run, model it as structured input the agent can observe and continue from.

Make your agent a stateless reducer. This is the hardest factor to apply literally, but the review value is high. If a run cannot be represented as state plus event plus next state, debugging becomes archaeology.

The Opposing View

The guide can read more anti-framework than some teams need.

That is the main caveat. Frameworks are not automatically the enemy. LangGraph, Mastra, OpenAI Agents SDK, Claude Agent SDK, and many internal harnesses exist because production agent systems need state, tracing, tools, retries, and deployment surfaces. Rebuilding all of that from scratch can be its own failure mode.

The better reading is narrower:

do not outsource prompts you cannot inspect
do not hide control flow you need to debug
do not store state where the product cannot reason about it
do not treat human approval as a UI-only concern
do not let a framework choose what belongs in context without review

That reading makes the guide compatible with frameworks. It turns it into an evaluation rubric.

If a framework helps you satisfy the factors, use it. If it prevents you from answering the review questions, avoid that part of the abstraction. Start with managed agents versus LangGraph versus DIY or Vercel AI SDK 6 versus LangGraph for TypeScript agents if you are choosing the stack.

The Production Pattern

The strongest pattern is boring:

deterministic product code receives an event
deterministic code prepares scoped context
the model returns structured output
deterministic code validates the output
tools execute with permissions and logs
human approval enters as structured tool output when needed
errors are compacted into context
state is persisted in a product-owned store
the final receipt explains what happened

That pattern is visible across the best agent content right now: Anthropic's Building Effective Agents, effective context engineering for AI agents, and the operational lessons in agent PR governance.

The lesson is not "never use agents." It is "agents should be software you can inspect."

How This Fits Claude Code And Codex

Claude Code and Codex make the factors concrete because they run inside real repos with real files, tools, permissions, and verification commands.

Abstract systems illustration for agent review checklist

AGENTS.md, CLAUDE.md, skills, hooks, MCP tools, worktrees, and final receipts are the local version of the same production concerns:

prompts become repo instructions and skills
context becomes selected files, memories, graph results, and traces
tools become shell commands, MCP servers, and permission prompts
state becomes files, branches, tasks, logs, and PRs
human contact becomes approval flows and review comments
control flow becomes the harness around the agent

That is why the guide pairs well with what Claude Code is, OpenAI Codex, agent swarms needing receipts, and the MCP server guide. The principles stop being abstract once an agent can edit your repo.

The Take

12-Factor Agents is worth reading because it pushes the agent conversation back toward software engineering.

The guide is not perfect. It is not a full architecture. It will not tell you which framework to pick. It will not prove your agent is reliable.

But it gives teams a useful shared vocabulary for the review that matters:

Who owns the prompt?
Who owns the context?
Who owns the loop?
Who owns the state?
Who owns the human handoff?
Who owns the final proof?

If you can answer those questions, you are much closer to a production agent than the team with the flashier demo.

FAQ

What is 12-Factor Agents?

12-Factor Agents is an open-source design guide from HumanLayer that adapts the spirit of the original 12-Factor App methodology to LLM-powered software and agent systems.

Is 12-Factor Agents a framework?

No. It is a principles guide, not a runtime, SDK, or package. Use it as a design and review checklist for whichever framework or custom harness you use.

What is the most important factor?

For production teams, the highest-leverage factors are owning your context window, owning your control flow, contacting humans with tool calls, and making the agent easy to reduce into state transitions.

Does the guide mean I should avoid LangGraph or agent frameworks?

No. The better lesson is to avoid abstractions that hide prompts, control flow, state, or human approval paths you need to inspect. A framework that keeps those surfaces reviewable can still be a good choice.

How should I apply it to Claude Code or Codex?

Map the factors onto repo instructions, skills, hooks, MCP tools, worktrees, tests, logs, and final receipts. Use the guide to ask whether the agent run is inspectable and recoverable.

What is the main risk?

The main risk is treating the factors as slogans. They only help if they turn into concrete review questions, source-linked context, deterministic checks, and clear ownership of state and control flow.

The Trend Was Real. The Checklist Is The Asset.

What The 12 Factors Actually Say

The Best Way To Use It

AI Security Scanners Move the Bottleneck to Triage

Models.dev Makes Model Routing Feel Like Infrastructure

Multi-Stream LLMs Hint at the Next Agent Architecture

Claude Code's Official Plugin Marketplace Is Here - and It's Already at 23k Stars

The Factors That Matter Most In Practice

The Opposing View

The Production Pattern

How This Fits Claude Code And Codex

The Take