
TL;DR
A long-form technical read on Flue from Fred K Schott, with deeper comparisons against OpenAI Agents, Vercel AI SDK, Google ADK, LangChain, Deep Agents, and CrewAI, plus practical production patterns.
Read next
OpenAI released their Agents SDK for TypeScript with first-class support for tool calling, structured outputs, multi-agent coordination, streaming, and human-in-the-loop approvals. Here is how each piece works.
9 min readThe math of agent pipelines is brutal. 85% reliability per step compounds to about 20% at 10 steps. Here is why long chains collapse in production, and the six patterns the field has converged on to fight the decay.
9 min readProduction-tested patterns for orchestrating AI agent teams - from fan-out parallelism to hierarchical delegation. Covers CrewAI, LangGraph, AutoGen, OpenAI Agents SDK, Google ADK, and custom approaches with real code.
14 min readFred K Schott posted Flue on May 1, 2026 as a response to a familiar pain point: many teams are building powerful agent prompts, but they are still hand stitching runtime behavior. If you are running agent workflows in real repos, this is a useful signal. Flue is not trying to be another generic API wrapper. It is trying to be a harness-first framework for running agents.
The idea is simple. You do not want every project to reinvent task orchestration, runtime control, session shape, and deployment glue. You want a framework to define those pieces once and let your team focus on behavior. That is exactly the pattern that made web frameworks like Next.js useful in the first place. You do not build your own server runtime every time, you build routes and logic.
This is a practical builder-level comparison focused on runtime architecture, deployment tradeoffs, and migration implications.
If you know him from Astro, this should feel familiar. Fred is a long-time open source builder with deep TypeScript and developer tooling experience, with a history tied to fast project bootstrap, compile-time developer experience, and community-first frameworks. He co-founded and helped scale the Astro ecosystem, and his move into Flue makes sense when you see the through line: reduce repetitive developer setup, standardize reusable patterns, and keep runtime behavior close to code.
If you follow him on X, the launch post itself is short, direct, and very "build-tools-first." The same voice shows in the early framing for Flue: minimal abstraction where needed, opinionated structure where scale requires it, and clear affordances for CI and local execution.
A lot of tooling in the agent stack still separates these concerns poorly:
You end up with a lot of duplicated infrastructure in every stack. Flue puts harness concerns in one place and tries to make them portable.
The official README frames it as the first agent harness framework and emphasizes that it is runtime agnostic and can be deployed on Node.js, Cloudflare, GitHub Actions, and GitLab CI/CD Flue README. The README language is blunt about being different from "yet another SDK," and that claim is testable if you look at how the examples are structured.
If you are already building agent systems, you likely treat this as obvious. Still, the boundary matters:
Most AI SDKs and graph frameworks are strong at the first two. Flue pushes the third to the center.
In concrete terms, a harness should answer:
Flue is opinionated exactly around these questions.
From the docs and examples, a few patterns repeat.
Flue examples show explicit handlers and typed outputs. The result is not just "chat completion text." You are expected to return structured outcomes that downstream automation can trust.
That sounds boring until you compare it with typical agent scripts where final output is still a natural language block.
Flue advertises deployability across environments and runtime forms. If your team expects an agent to run from CLI and from CI with matching behavior, this is the value proposition.
The practical impact is not just portability. It is consistency:
In Flue, sandbox strategy is first class. The docs include local and container style options, and the model says this is a tradeoff you define at runtime and project boundaries, not an implicit hidden behavior.
If your workflows include low-risk metadata jobs and high-risk shell operations, this distinction is important.
Flue includes markdown context conventions around AGENTS style files and project-local skill definitions. This does two things:
You are effectively treating your repo as the control plane.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
The examples in launch posts are useful, but these are the examples teams usually care about.
You can model a deploy failure as an input event, then define a set of bounded recovery tasks:
Then only escalate to a human when a threshold is crossed. This style is hard to maintain if each step uses a different orchestration style in each environment. A harness approach keeps this simpler.
Many teams split support across product, billing, and sales. A Flue style model can map incoming events to different agents with shared governance, and shared state contracts.
This is where repo-local behavior and session output schemas get valuable. You can avoid rewriting the same classification rules across environments.
In a monorepo, the same repository can have inconsistent release expectations. A harness framework helps you run the same recovery logic per package while still adapting to local tooling constraints.
Because Flue pushes runtime control into structured flows, you can create strict boundaries between model text and high risk execution. This supports a policy architecture where only approved paths are allowed at execution time.
I am not going to claim "best." I am going to compare what layer each stack solves first.
OpenAI gives excellent SDK and API tooling around model calls, tools, and session-like workflows. The OpenAI Agents docs and agent JS docs are strong for provider integration.
Where the stack differs:
If your stack is already provider-first and you want tighter OpenAI integrations, OpenAI stack makes sense.
The Vercel AI SDK is heavily used in production web apps. As of recent npm stats, ai is in the tens of millions of weekly downloads and @ai-sdk/openai is also very large. It is excellent for model provider abstraction, streaming UI integration, and app-level usage patterns.
The harness difference:
If your use case is mostly app-level model calls, AI SDK is still hard to beat. If your use case is multi-role execution with reproducible agent runtimes, Flue is stronger.
LangChain is a broad ecosystem and now a common choice for teams that want composability and long-lived memory tooling. A lot of teams use LangGraph for graph control and stateful flows.
Deep Agents is the LangChain implementation that leans into more explicit agent runtime workflows and has been used in full stack web-agent systems, including strong middleware and handoff patterns. If that is your current mode, it can be very compelling.
The key difference with Flue:
The right choice is less about who is technically richer on paper and more about where you want complexity to live.
CrewAI is practical for many Python-first teams and multi-agent role workflows. The template and crew model are simple to read. The tradeoff is the degree of TypeScript-native runtime portability is lower for teams that operate in JS/TS infrastructure first.
Flue is TypeScript-first by design, so it naturally fits teams already shipping TS tooling. That is not a quality comparison. It is a fit comparison.
To avoid hype, here is how I test this question.
If your team still writes custom runbook code for each environment, it is not a shift. If your team can move an agent flow from local to CI with mostly stable behavior, it is a shift.
If most outcomes are still free-form prose, your orchestration stays fragile. If outcomes are structured and contract oriented, you can automate safely.
If you still duplicate prompt and policy docs in dashboards or external stores, it is not yet a repo-owned harness. If you can keep policy inside codebase artifacts and review it with PRs, that is a real shift.
Flue is not done with this story forever. The project is young enough that API churn and ecosystem maturity are real risk.
None of these are blockers if your team treats this as platform work and funds it as engineering debt reduction.
You do not need a full rewrite.
Pick one task set with reliable input and output contracts. For example: triage a support queue.
Create strict output objects and test them. This improves automation immediately.
Keep your existing runner and a Flue runner in parallel. Compare:
Start with sandbox and approval policy. Then move routing. Then move persistence.
If local and CI are stable in one area, then expand.
Flue matters not because it is flashy, but because it puts a real design decision in one place: harness first, not tool glue first.
For teams already living in TypeScript and CI-heavy stacks, this is a practical path to reducing duplicated agent orchestration code.
For teams that are provider-first with strong existing ecosystem dependencies, the gain can be marginal and the migration cost high.
The bigger lesson for this whole industry is similar to every framework shift so far: value moves from "can it answer" to "can it run safely across environments with minimal extra glue." Flue is one of the clearest examples of that shift so far.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
The TypeScript toolkit for building AI apps. Unified API across OpenAI, Anthropic, Google. Streaming, tool calling, stru...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View ToolTypeScript-first AI agent framework. Workflows, RAG, tool use, evals, and integrations. Built for production Node.js app...
View ToolSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
Open AppOne control panel for Claude Code, Codex, Gemini, Cursor, and 10+ AI coding harnesses. Desktop app for Mac.
Open AppEvaluation harness for AI coding agents. Plus tier adds private benchmarks, CI hooks, and historical comparisons.
Open AppDeep comparison of the top AI agent frameworks - architecture, code examples, strengths, weaknesses, and when to use each one.
AI AgentsStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI Agents
OpenAI released their Agents SDK for TypeScript with first-class support for tool calling, structured outputs, multi-age...

The math of agent pipelines is brutal. 85% reliability per step compounds to about 20% at 10 steps. Here is why long cha...

Production-tested patterns for orchestrating AI agent teams - from fan-out parallelism to hierarchical delegation. Cover...

AI agents that reflect on failures, accumulate skills, and get better with every session. Reflection patterns, memory ar...

Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, co...

AI agents fail in ways traditional debugging cannot catch. Here are the tools and patterns for finding and fixing broken...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.