TL;DR
humanlayer/12-factor-agents crossed 20k stars with a simple argument: most AI agents fail in production because they ignore decades of software engineering wisdom. Here are the twelve principles fixing that.
Read next
A practical framework for building LLM-powered software that actually ships to production customers - not just demos. 21.8k stars and still climbing.
6 min readThe humanlayer/12-factor-agents repo distills hard-won lessons from shipping AI agents into 12 concrete principles. It crossed 21,000 stars on GitHub this week.
6 min readRuflo crossed 37,700 GitHub stars this week, adding nearly 1,900 in a single day. It turns Claude Code into a coordinated swarm of 100+ specialized agents with MCP integration, distributed vector memory, and zero-trust agent federation.
7 min readhumanlayer/12-factor-agents is one of the fastest-rising repositories on GitHub right now, accumulating hundreds of stars per day after months of sustained momentum. It is not a framework, a CLI, or a library. It is a guide - twelve principles adapted from the original 12-Factor App methodology and applied specifically to LLM-powered software.
That a principles document is trending among builders tells you something about where agent development sits in 2026. After two years of frameworks, SDKs, and platform promises, a meaningful number of engineers have hit the same wall: agents work in demos and break in production, and the failure mode is almost never the model. It is the software around the model.
The project was created by Dex Horthy at HumanLayer, a startup building human-in-the-loop tooling for AI workflows. The guiding question: "What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?"
The guide opens with an uncomfortable observation: most products marketed as "AI Agents" are not very agentic. What they are, in practice, is software with LLM calls inserted at decision points. The author's argument is that this is fine - and that pretending otherwise is why so many agent projects fail to ship.
The twelve factors give names and structure to patterns that experienced builders have converged on independently. Each factor addresses a common production failure mode:
Together these twelve factors describe a production-quality agent architecture without requiring any specific framework or library.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
May 17, 2026 • 8 min read
May 17, 2026 • 5 min read
May 16, 2026 • 8 min read
May 16, 2026 • 6 min read
This is not a package you install. It is a design checklist you apply when building. The repository includes code examples in TypeScript with Python equivalents available for most factors.
The core agent loop the guide builds toward looks like this:
const initial_event = { message: "..." };
let context = [initial_event];
while (true) {
const next_step = await llm.determine_next_step(context);
context.push(next_step);
if (next_step.intent === "done") {
return next_step.final_answer;
}
const result = await execute_step(next_step);
context.push(result);
}
This is the stateless reducer pattern from factor 12. The agent is a loop over context, and each iteration is a pure function. The LLM decides what happens next; the software executes it deterministically.
To work through the guide, start at the repository and read the factors in order. Each one links to a deeper explanation. For teams already running agents in production, the most immediately useful factors tend to be 2, 3, and 8 - owning prompts, context, and control flow. These are the places where framework magic most often becomes a liability at scale.
The HumanLayer team also maintains a companion repository at got-agents/agents with open-source agent implementations built directly on these principles.
Developers shipping their first agent to production. The guide gives you a vocabulary for decisions you will face and saves you from rediscovering each failure mode from scratch. Factor 5 - unifying execution state and business state - alone saves most teams weeks of debugging.
Teams whose agents work in demos but break under real load. This is the most common entry point for this guide. The principles do not fix model quality problems, but they address the structural issues that cause agents to fail unpredictably: inconsistent state, opaque control flow, and context windows that grow without discipline.
Engineers evaluating agent frameworks. If you are choosing between LangGraph, AutoGen, CrewAI, or building your own harness, the twelve factors give you a framework-neutral checklist. Does this framework let you own your prompts? Does it expose clean lifecycle management? These are answerable questions with the guide in hand.
Technical leads reviewing AI features before they ship. The factors work well as a code review checklist. Before merging an agent feature, you can walk through each factor and ask whether the implementation satisfies it or whether it defers the risk somewhere downstream.
The guide is less useful for pure research prototypes, one-off automation scripts, or systems where the model output is the final product (image generation, document translation) rather than a decision step in a larger workflow.
Several of these factors map directly onto patterns the DevDigest tools surface and automate.
Factor 7 - "contact humans with tool calls" - is the principle behind everything covered at hooks.developersdigest.tech. When an agent needs a human decision, the cleanest implementation routes that through the same tool-calling mechanism as API calls, rather than building a separate approval flow. Claude Code hooks follow this pattern exactly: a hook intercepts a tool call, adds a human decision point, and the agent resumes with the result. No special approval path required.
Factors 2 and 3 - owning prompts and context windows - connect to the CLAUDE.md and skills architecture powering Claude Code sessions. A well-constructed CLAUDE.md is context engineering in practice: you are explicitly managing what goes into the model's context window rather than letting a framework decide. Every session-level instruction, data source, and constraint you document is factor 3 applied.
Factor 10 reflects the design behind the skills registry at skills.developersdigest.tech. A Claude Code skill is a focused agent by another name - one responsibility, one activation pattern, debuggable in isolation. Composing a multi-step workflow from small focused skills maps directly to what factor 10 recommends.
Factor 11 - triggering from anywhere - is also what drives the subagent routing patterns at subagent.developersdigest.tech, where the same agent logic surfaces across CLI, API, and web interfaces without a separate implementation for each surface.
The 12-Factor Agents guide is genuinely useful and the principles are sound. The stateless reducer pattern and the guidance on owning control flow reflect hard-won production experience, not framework marketing. The connection to the original 12-Factor App gives builders from a web development background an immediately intuitive mental model.
The real limitations are worth naming. The guide is primarily written for developers building custom agents from scratch in TypeScript. Teams using higher-level frameworks like LangGraph or AutoGen will find that some factors - especially owning prompts and control flow - require significant customization or workarounds that the guide does not address. The path from "I understand factor 8" to "I've refactored my LangGraph workflow to satisfy it" is left as an exercise.
The guide also reflects a specific view: that agents should be mostly software with LLM calls inserted strategically. This is a defensible position for production B2B products. It undersells the cases where more autonomous loops are genuinely appropriate, particularly in code generation at scale or tasks where the search space is too large for explicit control flow.
Treat it as a production checklist rather than a complete architecture specification and it earns its star count.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
TypeScript-first AI agent framework. Agents, tools, memory, workflows, RAG, evals, tracing, MCP, and production deployme...
View ToolMulti-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolThe original server-side JavaScript runtime. V8 under the hood, npm ecosystem, and the default backend runtime for most...
View ToolA practical framework for building LLM-powered software that actually ships to production customers - not just demos. 21...
The humanlayer/12-factor-agents repo distills hard-won lessons from shipping AI agents into 12 concrete principles. It c...
agentmemory is a self-hosted MCP server that gives Claude Code, Cursor, and Gemini CLI searchable long-term memory acros...
agentmemory gives AI coding agents a persistent brain - capturing session context automatically via 12 Claude Code hooks...
AgentMemory hit GitHub's daily trending list with 400 new stars today, offering a persistent memory layer for AI coding...
Ruflo crossed 37,700 GitHub stars this week, adding nearly 1,900 in a single day. It turns Claude Code into a coordinate...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.