Flue: The Agent Harness Framework and Why It Feels Different

Fred K Schott posted Flue on May 1, 2026 as a response to a familiar pain point: many teams are building powerful agent prompts, but they are still hand stitching runtime behavior. If you are running agent workflows in real repos, this is a useful signal. Flue is not trying to be another generic API wrapper. It is trying to be a harness-first framework for running agents.

The idea is simple. You do not want every project to reinvent task orchestration, runtime control, session shape, and deployment glue. You want a framework to define those pieces once and let your team focus on behavior. That is exactly the pattern that made web frameworks like Next.js useful in the first place. You do not build your own server runtime every time, you build routes and logic.

This is a practical builder-level comparison focused on runtime architecture, deployment tradeoffs, and migration implications.

Who is Fred K Schott

If you know him from Astro, this should feel familiar. Fred is a long-time open source builder with deep TypeScript and developer tooling experience, with a history tied to fast project bootstrap, compile-time developer experience, and community-first frameworks. He co-founded and helped scale the Astro ecosystem, and his move into Flue makes sense when you see the through line: reduce repetitive developer setup, standardize reusable patterns, and keep runtime behavior close to code.

If you follow him on X, the launch post itself is short, direct, and very "build-tools-first." The same voice shows in the early framing for Flue: minimal abstraction where needed, opinionated structure where scale requires it, and clear affordances for CI and local execution.

Why this matters now

A lot of tooling in the agent stack still separates these concerns poorly:

Model and tool calls in one layer.
Tool orchestration in another layer.
Runtime decisions in an ad hoc layer built separately for each deployment.

You end up with a lot of duplicated infrastructure in every stack. Flue puts harness concerns in one place and tries to make them portable.

The official README frames it as the first agent harness framework and emphasizes that it is runtime agnostic and can be deployed on Node.js, Cloudflare, GitHub Actions, and GitLab CI/CD Flue README. The README language is blunt about being different from "yet another SDK," and that claim is testable if you look at how the examples are structured.

What exactly is a harness, and why Flue calls itself that

If you are already building agent systems, you likely treat this as obvious. Still, the boundary matters:

Prompting layer = how you ask a model to reason.
Tool layer = what the system can call.
Harness layer = how sessions start, what runs, how outputs are shaped, where execution happens, and what happens on failure.

Most AI SDKs and graph frameworks are strong at the first two. Flue pushes the third to the center.

In concrete terms, a harness should answer:

How do I route tasks?
How do I keep runtime consistent across local and CI?
How do I persist session outputs for the next run?
How do I move from shell behavior to HTTP behavior with minimal rewiring?

Flue is opinionated exactly around these questions.

Flue architecture primitives in practice

From the docs and examples, a few patterns repeat.

1) Agent units with explicit behavior entry points

Flue examples show explicit handlers and typed outputs. The result is not just "chat completion text." You are expected to return structured outcomes that downstream automation can trust.

That sounds boring until you compare it with typical agent scripts where final output is still a natural language block.

2) Runtime target flexibility by design

Flue advertises deployability across environments and runtime forms. If your team expects an agent to run from CLI and from CI with matching behavior, this is the value proposition.

The practical impact is not just portability. It is consistency:

same task declaration syntax,
same logging format,
same session assumptions,
different environment details.

3) Execution model as an explicit part of code

In Flue, sandbox strategy is first class. The docs include local and container style options, and the model says this is a tradeoff you define at runtime and project boundaries, not an implicit hidden behavior.

If your workflows include low-risk metadata jobs and high-risk shell operations, this distinction is important.

4) AGENTS.md and skill-style context

Flue includes markdown context conventions around AGENTS style files and project-local skill definitions. This does two things:

keeps agent behavior and docs near code,
avoids separate "agent memory store" systems for non-sensitive local behavior.

You are effectively treating your repo as the control plane.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Flue and the Agent Harness Layer

May 2, 2026 • 8 min read

GitHub Copilot Coding Agent and CLI: Why GitHub Is Back in the Agent Race

May 2, 2026 • 8 min read

jcode and the Coding Agent Harness Wars

May 2, 2026 • 8 min read

lib0xc Is the Opposite of Rewrite Culture

May 2, 2026 • 8 min read

Practical examples beyond the product docs

The examples in launch posts are useful, but these are the examples teams usually care about.

Example 1: CI recovery agent with bounded escalation

You can model a deploy failure as an input event, then define a set of bounded recovery tasks:

collect failing workflows,
collect changed files,
check known flake patterns,
create one deterministic recommendation object.

Then only escalate to a human when a threshold is crossed. This style is hard to maintain if each step uses a different orchestration style in each environment. A harness approach keeps this simpler.

Example 2: Multi-team support routing agent

Many teams split support across product, billing, and sales. A Flue style model can map incoming events to different agents with shared governance, and shared state contracts.

This is where repo-local behavior and session output schemas get valuable. You can avoid rewriting the same classification rules across environments.

Example 3: Migration assistant for monorepos

In a monorepo, the same repository can have inconsistent release expectations. A harness framework helps you run the same recovery logic per package while still adapting to local tooling constraints.

Example 4: Team-owned policy for secrets and approval

Because Flue pushes runtime control into structured flows, you can create strict boundaries between model text and high risk execution. This supports a policy architecture where only approved paths are allowed at execution time.

How Flue compares to popular alternatives

I am not going to claim "best." I am going to compare what layer each stack solves first.

Against OpenAI style SDKs and platform agents

OpenAI gives excellent SDK and API tooling around model calls, tools, and session-like workflows. The OpenAI Agents docs and agent JS docs are strong for provider integration.

Where the stack differs:

OpenAI stacks assume you are happy for the provider ecosystem to be the default runtime center.
Flue assumes you want to own the harness logic and run it wherever your repo needs it.

If your stack is already provider-first and you want tighter OpenAI integrations, OpenAI stack makes sense.

Against the Vercel AI SDK

The Vercel AI SDK is heavily used in production web apps. As of recent npm stats, ai is in the tens of millions of weekly downloads and @ai-sdk/openai is also very large. It is excellent for model provider abstraction, streaming UI integration, and app-level usage patterns.

The harness difference:

AI SDK: excellent as model and tool orchestration layer.
Flue: explicit harness and execution control layer.

If your use case is mostly app-level model calls, AI SDK is still hard to beat. If your use case is multi-role execution with reproducible agent runtimes, Flue is stronger.

Against LangChain and Deep Agents

LangChain is a broad ecosystem and now a common choice for teams that want composability and long-lived memory tooling. A lot of teams use LangGraph for graph control and stateful flows.

Deep Agents is the LangChain implementation that leans into more explicit agent runtime workflows and has been used in full stack web-agent systems, including strong middleware and handoff patterns. If that is your current mode, it can be very compelling.

The key difference with Flue:

LangChain + Deep Agents stack gives strong graph composition and ecosystem depth.
Flue gives a harness-first pattern that is closer to "agent runtime as deployable unit."

The right choice is less about who is technically richer on paper and more about where you want complexity to live.

Against CrewAI style YAML orchestration

CrewAI is practical for many Python-first teams and multi-agent role workflows. The template and crew model are simple to read. The tradeoff is the degree of TypeScript-native runtime portability is lower for teams that operate in JS/TS infrastructure first.

Flue is TypeScript-first by design, so it naturally fits teams already shipping TS tooling. That is not a quality comparison. It is a fit comparison.

Is Flue just rebranding, or a real shift?

To avoid hype, here is how I test this question.

Test 1: framework does the harness work for you

If your team still writes custom runbook code for each environment, it is not a shift. If your team can move an agent flow from local to CI with mostly stable behavior, it is a shift.

Test 2: session output can be consumed by other systems

If most outcomes are still free-form prose, your orchestration stays fragile. If outcomes are structured and contract oriented, you can automate safely.

Test 3: repo-local rules are first class

If you still duplicate prompt and policy docs in dashboards or external stores, it is not yet a repo-owned harness. If you can keep policy inside codebase artifacts and review it with PRs, that is a real shift.

Risks you should plan for

Flue is not done with this story forever. The project is young enough that API churn and ecosystem maturity are real risk.

Vendor and API churn
- expect breaking changes over time.
- pin versions and run staged rollouts.
Community breadth
- a smaller ecosystem now means fewer prebuilt integrations.
Operational burden
- harness behavior can become opinionated, especially in bigger stacks.
- you still need good monitoring and traceability.

None of these are blockers if your team treats this as platform work and funds it as engineering debt reduction.

A practical migration path if you are considering Flue

You do not need a full rewrite.

Step 1: isolate one bounded workflow

Pick one task set with reliable input and output contracts. For example: triage a support queue.

Step 2: define typed outcomes

Create strict output objects and test them. This improves automation immediately.

Step 3: run dual-stack

Keep your existing runner and a Flue runner in parallel. Compare:

latency,
failure profile,
cost,
operational complexity.

Step 4: move one policy layer at a time

Start with sandbox and approval policy. Then move routing. Then move persistence.

Step 5: only then scale to multiple environments

If local and CI are stable in one area, then expand.

Final take

Flue matters not because it is flashy, but because it puts a real design decision in one place: harness first, not tool glue first.

For teams already living in TypeScript and CI-heavy stacks, this is a practical path to reducing duplicated agent orchestration code.

For teams that are provider-first with strong existing ecosystem dependencies, the gain can be marginal and the migration cost high.

The bigger lesson for this whole industry is similar to every framework shift so far: value moves from "can it answer" to "can it run safely across environments with minimal extra glue." Flue is one of the clearest examples of that shift so far.

Sources

Flue repository README: https://raw.githubusercontent.com/withastro/flue/main/README.md
Flue landing page: https://flueframework.com/
OpenAI Agents SDK docs: https://platform.openai.com/docs/guides/agents-sdk/
OpenAI Agents JS guide: https://openai.github.io/openai-agents-js/guides/quickstart/
Vercel AI SDK package: https://www.npmjs.com/package/ai
Vercel AI SDK OpenAI provider: https://www.npmjs.com/package/%40ai-sdk/openai
Astro repository: https://github.com/withastro/astro
Fred K Schott on X: https://twitter.com/FredKSchott
Google ADK agents docs: https://adk.dev/agents/
LangChain documentation: https://docs.langchain.com/oss/python/concepts/products
LangChain Deep Agents docs: https://docs.langchain.com/oss/javascript/deepagents/overview
Deep Agents reference: https://reference.langchain.com/javascript/modules/deepagents.html
Deep Agents package stats page: https://npmjs.com/package/deepagents
CrewAI installation docs: https://docs.crewai.com/en/installation

OpenAI Agents SDK for TypeScript: A Practical Guide

The Agent Reliability Cliff: Why Your 10-Step Chain Only Succeeds 20% of the Time

How to Coordinate Multiple AI Agents: The Definitive Guide for 2026

Who is Fred K Schott

Why this matters now

What exactly is a harness, and why Flue calls itself that

Flue architecture primitives in practice

1) Agent units with explicit behavior entry points

2) Runtime target flexibility by design

3) Execution model as an explicit part of code

4) AGENTS.md and skill-style context

Flue and the Agent Harness Layer

GitHub Copilot Coding Agent and CLI: Why GitHub Is Back in the Agent Race

jcode and the Coding Agent Harness Wars

lib0xc Is the Opposite of Rewrite Culture

Practical examples beyond the product docs

Example 1: CI recovery agent with bounded escalation

Example 2: Multi-team support routing agent

Example 3: Migration assistant for monorepos

Example 4: Team-owned policy for secrets and approval

How Flue compares to popular alternatives

Against OpenAI style SDKs and platform agents

Against the Vercel AI SDK

Against LangChain and Deep Agents

Against CrewAI style YAML orchestration

Is Flue just rebranding, or a real shift?

Test 1: framework does the harness work for you

Test 2: session output can be consumed by other systems

Test 3: repo-local rules are first class

Risks you should plan for

A practical migration path if you are considering Flue

Step 1: isolate one bounded workflow

Step 2: define typed outcomes

Step 3: run dual-stack

Step 4: move one policy layer at a time

Step 5: only then scale to multiple environments

Final take

Sources

Comments

Related Tools

Vercel AI SDK

Composio

OpenAI Agents SDK

Mastra

Apps from Developers Digest

Overnight Agents

Agent Hub

Agent Eval Bench Plus

Related Guides

AI Agent Frameworks Compared: CrewAI vs LangGraph vs AutoGen vs Claude Code

Building Your First MCP Server

Claude Code Setup Guide

Related Posts

OpenAI Agents SDK for TypeScript: A Practical Guide

The Agent Reliability Cliff: Why Your 10-Step Chain Only Succeeds 20% of the Time

How to Coordinate Multiple AI Agents: The Definitive Guide for 2026

Self-Improving AI Agents: Building Systems That Learn From Their Mistakes

AI Agent Memory Patterns

How to Debug AI Agent Workflows

Get Smarter About AI Dev

OpenAI Agents SDK for TypeScript: A Practical Guide

The Agent Reliability Cliff: Why Your 10-Step Chain Only Succeeds 20% of the Time

How to Coordinate Multiple AI Agents: The Definitive Guide for 2026

Who is Fred K Schott

Why this matters now

What exactly is a harness, and why Flue calls itself that

Flue architecture primitives in practice

1) Agent units with explicit behavior entry points

2) Runtime target flexibility by design

3) Execution model as an explicit part of code

4) AGENTS.md and skill-style context

Flue and the Agent Harness Layer

GitHub Copilot Coding Agent and CLI: Why GitHub Is Back in the Agent Race

jcode and the Coding Agent Harness Wars

lib0xc Is the Opposite of Rewrite Culture

Practical examples beyond the product docs

Example 1: CI recovery agent with bounded escalation

Example 2: Multi-team support routing agent

Example 3: Migration assistant for monorepos

Example 4: Team-owned policy for secrets and approval