
TL;DR
The Multi-Stream LLMs paper argues that agents are bottlenecked by single chat streams. The practical takeaway is not to rebuild everything today, but to design agent runtimes around separated channels.
Read next
DeepSeek-TUI is trending because developers want Claude Code-shaped workflows with different models. The real story is portability: approvals, rollback, diagnostics, queues, and cost telemetry are becoming the agent runtime.
8 min readA long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state, verify behavior, limit cost, and recover from failure.
8 min readEfficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and execution environments, then return compact summaries and receipts.
8 min readThe Multi-Stream LLMs paper is worth reading because it pokes at a design assumption most agent products still inherit from chat.
Even advanced coding agents usually run through one sequence of messages. The user speaks, the model thinks, the model calls a tool, the tool responds, the model writes, and the loop continues. Everything is serialized into one stream.
The paper argues that this is a bottleneck. A model cannot read while writing. It cannot act while thinking. It cannot update an answer while new tool output arrives. It cannot keep system instructions, private reasoning, tool observations, and user-facing output as truly separate channels.
That is a research claim, not a product you can drop into your stack this afternoon. But the design pressure is real.
If long-running agents need harnesses, then the next harness probably needs multiple channels.
Today's agent runtimes make a single transcript carry too many jobs:
That works surprisingly well for short tasks. It gets awkward for real development work.
A coding agent may need to keep reading logs while drafting a fix. A browser agent may need to watch the page while it writes a test. A reviewer agent may need to separate policy checks from implementation notes. A security agent may need to keep exploit analysis away from user-visible summary text.
We approximate this today with conventions: tool messages, scratchpads, status logs, subagents, task queues, and structured outputs. The paper's useful provocation is that these should be model-level streams, not just formatting tricks.
The authors propose instruction-tuning language models for multiple parallel streams of computation. In their framing, every forward pass can read from multiple input streams and generate tokens into multiple output streams, with causal dependency across earlier timesteps.
The practical promise is threefold:
That maps cleanly onto agent infrastructure.
For example, a coding agent runtime could separate:
user: the task and clarifications,policy: allowed files, network rules, approval gates,tools: command output and browser observations,plan: private task decomposition,diff: code edits,status: short human-readable progress,final: the answer.You can simulate some of this with today's APIs, but the model still consumes and emits through a mostly sequential loop. Multi-stream training would make the separation native.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
May 22, 2026 • 8 min read
May 21, 2026 • 7 min read
May 20, 2026 • 7 min read
May 19, 2026 • 8 min read
The skeptical view is that this is overkill.
Most production agent failures are not caused by single-stream architecture. They are caused by weak prompts, bad context selection, missing tests, permissive credentials, unclear task scope, and humans trusting summaries without receipts.
That is true.
You do not need a new model architecture to improve your agent workflow today. You need smaller tasks, better tool logs, stricter review, and clearer context boundaries. The agent context reduction pattern still matters more than speculative model plumbing for most teams.
But research like this matters because it points to where current workarounds are trying to go.
Subagents are a workaround for parallel streams. Tool messages are a workaround for observation streams. Hidden reasoning fields are a workaround for thinking streams. Structured output is a workaround for final-channel discipline. The market is already asking for separation; the model interface just has not caught up.
You do not have to wait for multi-stream models to build better agents.
Design your runtime as if channels are real:
This makes today's agents safer and tomorrow's agents easier to adopt.
It also clarifies product design. A good agent UI should not be one giant transcript. It should show the human the right stream at the right moment: plan when approving, diff when reviewing, logs when debugging, final output when closing the task.
That is why terminal agents need a portable runtime surface. The agent is not just text. It is a bundle of streams, files, tools, and decisions.
The near-term signal is not whether every provider ships a multi-stream model. Watch for interfaces that make streams more explicit:
OpenAI's recent Codex release notes mention richer context, goal mode, browser improvements, and remote locked use. Those are product-level moves toward longer-running, context-aware work. Multi-stream research is the model-level version of the same pressure.
The future of agents probably looks less like one chat window and more like an operating surface with channels.
The mistake would be waiting for a new architecture before improving your workflow. The better move is to make channel boundaries explicit now: policy, tools, plan, diff, status, and final answer.
If multi-stream models arrive, your runtime will be ready. If they do not, your agents will still be easier to debug.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Self-healing browser automation harness that lets LLMs complete any browser task. 5,000+ stars in under a week.
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View ToolMulti-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolDeep comparison of the top AI agent frameworks - architecture, code examples, strengths, weaknesses, and when to use each one.
AI AgentsResearcher, auditor, reviewer, and other ready-made subagent types.
Claude CodePrevent bloating the main conversation with research or exploration.
Claude Code
DeepSeek-TUI is trending because developers want Claude Code-shaped workflows with different models. The real story is p...

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and...

From single-agent baselines to multi-level hierarchies, these are the seven patterns for wiring AI agents together in pr...

Master tool use in the Claude API. Schema design, retry logic, multi-step loops, and the failure modes that only show up...

Anthropic's Project Glasswing update is a useful signal for developer teams: AI can find vulnerability candidates faster...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.