
TL;DR
HumanLayer's 12-Factor Agents guide turns agent reliability into an engineering checklist: own prompts, context, tools, control flow, state, human approval, and observability before a demo becomes production.
Last updated: June 24, 2026
The older posts on this site covered humanlayer/12-factor-agents as a fast-rising GitHub Trending repo. The dated star-count hook is stale. The production checklist is still useful.
As of this refresh, the GitHub API shows humanlayer/12-factor-agents at roughly 23.5k stars, 1.8k forks, TypeScript-first, and last pushed in September 2025. The README describes it as principles for building reliable LLM applications, inspired by the original 12-Factor App methodology.
The repo is not a framework, CLI, or package. It is a design guide by Dex Horthy and HumanLayer for building LLM-powered software that is good enough for production customers.
That framing is why it still matters. Agent teams do not need one more magic loop. They need a language for review: prompts, context, tools, state, lifecycle, humans, control flow, errors, scope, triggers, and reducers.
This belongs next to what an AI coding agent is, context engineering, long-running agents needing harnesses, debugging AI agent workflows, and permissions, logs, and rollback for AI coding agents. The model is not the product. The system around the model is.
The current README lists the twelve factors as a short version:

The useful theme is not the number twelve. It is ownership.
Own the prompt. Own the context. Own the control flow. Own the lifecycle. Own the state boundary. Own how humans enter the loop.
That is the opposite of many agent demos, where a framework hides the prompt, grows the context window, stores hidden state, and calls the result "autonomy."
Use 12-Factor Agents as a review checklist.
Before shipping an agent feature, ask:
Those questions pair naturally with tool-use production patterns, agent workflows as state machines, agent replays, and agent security before connecting tools. The checklist is not about rejecting tools. It is about refusing hidden behavior where review needs explicit behavior.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
May 23, 2026 • 8 min read
May 23, 2026 • 7 min read
May 23, 2026 • 8 min read
May 22, 2026 • 5 min read
All twelve are useful, but four carry most of the production load.
Own your context window. Context is now the system boundary. If you let every log line, memory, tool result, and retrieved document into the prompt, you get a slower and less reliable agent. Context should be curated, source-linked, compacted, and reviewable.
Own your control flow. The model can choose the next action, but the application should own the loop, stop conditions, retries, approval gates, and budget ceilings. This is the difference between a product and a runaway process.
Contact humans with tool calls. Human approval should not be a side channel. If a human decision changes the run, model it as structured input the agent can observe and continue from.
Make your agent a stateless reducer. This is the hardest factor to apply literally, but the review value is high. If a run cannot be represented as state plus event plus next state, debugging becomes archaeology.
The guide can read more anti-framework than some teams need.
That is the main caveat. Frameworks are not automatically the enemy. LangGraph, Mastra, OpenAI Agents SDK, Claude Agent SDK, and many internal harnesses exist because production agent systems need state, tracing, tools, retries, and deployment surfaces. Rebuilding all of that from scratch can be its own failure mode.
The better reading is narrower:
That reading makes the guide compatible with frameworks. It turns it into an evaluation rubric.
If a framework helps you satisfy the factors, use it. If it prevents you from answering the review questions, avoid that part of the abstraction. Start with managed agents versus LangGraph versus DIY or Vercel AI SDK 6 versus LangGraph for TypeScript agents if you are choosing the stack.
The strongest pattern is boring:
That pattern is visible across the best agent content right now: Anthropic's Building Effective Agents, effective context engineering for AI agents, and the operational lessons in agent PR governance.
The lesson is not "never use agents." It is "agents should be software you can inspect."
Claude Code and Codex make the factors concrete because they run inside real repos with real files, tools, permissions, and verification commands.

AGENTS.md, CLAUDE.md, skills, hooks, MCP tools, worktrees, and final receipts are the local version of the same production concerns:
That is why the guide pairs well with what Claude Code is, OpenAI Codex, agent swarms needing receipts, and the MCP server guide. The principles stop being abstract once an agent can edit your repo.
12-Factor Agents is worth reading because it pushes the agent conversation back toward software engineering.
The guide is not perfect. It is not a full architecture. It will not tell you which framework to pick. It will not prove your agent is reliable.
But it gives teams a useful shared vocabulary for the review that matters:
If you can answer those questions, you are much closer to a production agent than the team with the flashier demo.
12-Factor Agents is an open-source design guide from HumanLayer that adapts the spirit of the original 12-Factor App methodology to LLM-powered software and agent systems.
No. It is a principles guide, not a runtime, SDK, or package. Use it as a design and review checklist for whichever framework or custom harness you use.
For production teams, the highest-leverage factors are owning your context window, owning your control flow, contacting humans with tool calls, and making the agent easy to reduce into state transitions.
No. The better lesson is to avoid abstractions that hide prompts, control flow, state, or human approval paths you need to inspect. A framework that keeps those surfaces reviewable can still be a good choice.
Map the factors onto repo instructions, skills, hooks, MCP tools, worktrees, tests, logs, and final receipts. Use the guide to ask whether the agent run is inspectable and recoverable.
The main risk is treating the factors as slogans. They only help if they turn into concrete review questions, source-linked context, deterministic checks, and clear ownership of state and control flow.
Read next
A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state, verify behavior, limit cost, and recover from failure.
9 min readManual approval prompts stop protecting users when coding agents ask too often. The better pattern is risk-aware autonomy: safe defaults, narrow deny rules, and approvals only for meaningful changes.
7 min readSkills turn a general coding agent into a trained teammate by packaging runbooks, scripts, examples, and domain-specific judgment into reusable instructions.
7 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Multi-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolAnthropic's Python SDK for building production agent systems. Tool use, guardrails, agent handoffs, and orchestration. R...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
View AppDesign subagents visually instead of editing YAML by hand.
View AppQueue and organize repeatable agent workflows before they become production automations.
View AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsDeep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.
AI AgentsDefine custom subagent types within your project's memory layer.
Claude Code
Goal, loop, routine. Three verbs, two tools, one hard part. A complete field guide to running agentic loops in Claude Co...

Anthropic's open-source vulnerability harness shows where AI security work is going: reproducible exploit loops, separat...

Anthropic's Claude containment writeup points to the next security layer for coding agents: deterministic capability led...

CodeGraph shows why coding agents need a local, queryable repo map. The win is not magic token savings. It is faster ori...

A front-page Hacker News essay about being tired of AI answers points at a real developer problem: chat is too easy to l...

Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs,...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.