
TL;DR
Perplexity launched a $200-a-month agent that coordinates 19 models and calls orchestration, not the model, the product. Here is the strategic case for why the durable, defensible layer in AI sits next to the labs, not inside them - and what 'token value per watt per user' actually means for builders.
| Topic | Source |
|---|---|
| Perplexity Computer: 19 models, $200/mo, launch details | VentureBeat |
| "The orchestration is the product" + team analogy | Fortune |
| "Token value per watt per user" and the winning-objective framing | CNBC, Tekedia |
| Srinivas on model commoditisation and reasoning | 20VC / The Twenty Minute VC |
There is a quiet assumption underneath most AI strategy: that value flows to whoever trains the best model. Spend the most on compute, win the benchmarks, capture the market. It is a clean story, and the labs have every incentive to keep telling it.
Aravind Srinivas, Perplexity's CEO, is making the opposite bet in public. In February 2026 Perplexity shipped a product called Computer - an agent that does not try to be the smartest model in the room. It tries to be the smartest manager of other people's models. And the way Srinivas describes it amounts to a thesis the model labs have a structural hard time saying out loud: the orchestration is the product, and the model is a tool.
This post takes that thesis seriously and argues it is more right than the consensus credits. It is the standalone companion to our broader argument in why the orchestration layer is the next big play next to the labs. That piece maps the whole landscape. This one zooms in on the most committed bettor in it.
On February 25, 2026, Perplexity launched Computer, which Srinivas called the most ambitious product in the company's three-year history. The headline number is the tell: Computer coordinates 19 different models on the backend rather than running on a single house model. Per VentureBeat's reporting, the lineup spans Claude Opus 4.6 for orchestration and coding, Google's Gemini for deep research, Google's Nano Banana for images and Veo 3.1 for video, xAI's Grok for fast lightweight tasks, and ChatGPT 5.2 for long-context recall. Computer launched to Perplexity Max subscribers at $200 a month.
Notice what that is not. It is not "we fine-tuned a model and wrapped a UI around it." It is a system whose entire reason for existing is deciding which external model handles which sub-task, and then stitching the results into one coherent piece of work. Srinivas does not even call it a router. He frames it as an orchestrator - or, in his broader public language, an "omni agent" that picks the model, coordinates multiple agents, and decides what runs locally versus in the cloud.
The clearest articulation came in his Fortune interview, where he reached for a hiring analogy:
"When you build a team, you don't build a homogenous group where everyone has the same skills. You build a team with diverse strengths. We're applying that same logic to AI workflows. The orchestration is the product."
Read that last sentence as a strategy statement, not a product description. He is telling you where he thinks the defensible value sits.
The reason this thesis is interesting is not that it is clever. It is that it is structurally available to Perplexity and structurally awkward for OpenAI, Anthropic, or Google to adopt.
A frontier lab's entire capital story is that its model is the irreplaceable asset. Tens of billions in training compute only pencils out if the model is the moat. A lab that stood up and said "honestly, the model is a commodity tool and the orchestration on top is the real product" would be undercutting its own valuation narrative. So labs route to their own models by default, even when a competitor's model is better for a given sub-task, because every query that leaves their stack is a query that admits the commodity thesis.
Perplexity has no such conflict. It owns no frontier model it must defend, which means it can do the thing users actually benefit from: send each sub-task to whichever lab is genuinely best at it. The 19-model lineup is only possible because Perplexity is indifferent to which lab wins any individual call. That indifference is the product. A lab cannot fake it, because its cap table will not let it.
Srinivas made the commoditisation point directly on Harry Stebbings' 20VC podcast, where the conversation centered on whether foundation models will commoditise and where the next gains in model performance actually come from. If you believe raw model quality is converging - and the open-weights cost curve we cover in our routing recipes guide suggests it is - then the marginal advantage stops living inside any one model and starts living in the layer that decides how to use all of them.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 17, 2026 • 6 min read
Jun 17, 2026 • 11 min read
Jun 17, 2026 • 9 min read
Jun 15, 2026 • 9 min read
The most revealing thing Srinivas has said is not the orchestration line. It is the metric he wants to be judged on.
In a CNBC interview, he argued that the company best able to maximize "token value per watt per user" will command the highest valuation over time. As reported by CNBC and Tekedia, he framed the winning objective this way:
"Whoever is able to maximize this particular objective really will, by balancing accuracy, latency, cost, privacy and intelligence all together, they're going to win, that's what's going to win long term."
Sit with the shape of that metric. It is not tokens per second. It is not benchmark score. It is not parameter count. It is useful output (token value) normalized by energy (per watt) and by person (per user). Every term in it is an orchestration variable, not a model variable:
A lab optimizes for the numerator of one model's capability. An orchestrator optimizes the whole ratio across many models. If Srinivas is right that the ratio is what the market eventually prices, then the orchestration layer is not a thin wrapper. It is where the optimization problem that matters actually lives.
It is worth being exact, because the two words get used interchangeably and they are not the same thing.
Routing optimizes the choice within a fixed shape of work. A request comes in, a policy picks the best model for that single request, the response goes out. Factory's Factory Router is a strong example: it scores models on cost and capability and sends each coding request to the right one. The shape of the work - one request, one answer - is held constant. We break down that cost-per-task dynamic in our piece on Factory AI and the model routing era.
Orchestration optimizes the shape of the work itself. It decides how a task is decomposed into sub-tasks, how many agents run, how they hand off to each other, what executes locally versus in the cloud, and when to call a tool instead of a model at all. Routing is a subroutine inside orchestration - the part that picks a model once the shape is set.
Computer is aiming at the second category. When it takes "research this company and draft a memo," it does not make one model call. It plans, dispatches deep research to one model, image or chart generation to another, drafting to a third, and reconciles the outputs. That is orchestration doing the expensive cognitive work, with routing nested inside each step.
Why is this layer defensible rather than a feature a lab bolts on next quarter?
First, neutrality is the moat. The value of orchestrating 19 models comes precisely from being willing to pick a competitor's model when it is better. A lab can build an orchestrator, but it cannot credibly build a neutral one, because its incentives push every borderline call toward its own stack. Users notice. Perplexity's neutrality is a position labs cannot occupy without contradicting their own economics.
Second, the optimization surface is broad and operational, not just algorithmic. Getting token value per watt per user right means continuously tuning model selection against shifting prices, new releases, latency profiles, and privacy constraints. That is a moving operational problem - exactly the kind of work that compounds into a durable product over time rather than a copyable feature. It looks a lot like the seven orchestration patterns maturing into a managed surface.
Third, the layer captures the user relationship. Whoever owns orchestration owns the interface where work actually gets done, which means they own the data about what works, the trust, and the switching cost. The model underneath becomes interchangeable plumbing. That is the precise inversion Srinivas is betting on: the model is the tool, and the thing you actually pay for and depend on is the orchestration.
None of this means the labs lose. Frontier models remain the scarce, expensive ingredient that orchestration depends on - Computer is worthless without good models to coordinate. The argument is narrower and more interesting: that a second, durable layer of value is forming adjacent to the labs, and that the labs are structurally the least able to claim it.
You do not need to ship Perplexity Computer to act on the thesis. The strategic move for a builder is the same one Srinivas made at company scale, shrunk to fit your stack:
The label on the bet is "orchestration is the product." The practical version is humbler and immediately actionable: the work of deciding which model does what, for which task, under which constraints, is real work that creates real value - and it is increasingly where the margin and the defensibility live. Srinivas built a $200-a-month product on that idea. You can start with a routing config.
Computer is an AI agent Perplexity launched in February 2026 that coordinates 19 different models on the backend - including Claude, Gemini, Grok, and ChatGPT - to complete multi-step tasks, rather than relying on a single house model. It launched to Perplexity Max subscribers at $200 a month, per VentureBeat.
It is Aravind Srinivas's framing, given to Fortune, that the defensible value in AI sits in the layer that decides which model handles which sub-task and how agents coordinate - not in any single model, which he treats as an interchangeable tool.
It is the metric Srinivas told CNBC he believes will determine AI winners: useful output (token value) normalized by energy (per watt) and per person (per user), balancing accuracy, latency, cost, privacy, and intelligence together.
Routing optimizes the model choice within a fixed shape of work - one request, one answer. Orchestration optimizes the shape of the work itself: how a task is decomposed, how many agents run, what runs locally versus in the cloud, and when to call a tool instead of a model. Routing is a subroutine inside orchestration. See our orchestration layer breakdown for the fuller distinction.
Read next
A $500M accidental Claude bill and an open-weights model beating GPT-5.5 at one-sixth the cost point to the same conclusion: the margin is moving to the layer that decides when to use which model for what. Here is how routing and orchestration differ, and how to cut your model spend.
12 min readFactory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their architecture reveals about where the whole industry is heading.
8 min readA code-heavy field guide to model routing. Real, runnable-style configs for tiering tasks by complexity, routing simple work to open-weights, reserving frontier models for hard reasoning, building failover chains, and keeping prompt caches warm with OpenRouter, LiteLLM, and Factory Router.
11 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Unified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolMulti-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolCentralized manager for MCP servers. Connect once to localhost:37373 and access all your servers through a single endpoi...
View ToolOpen-source ChatGPT alternative that runs 100% offline. Desktop app with local models, cloud API connections, custom ass...
View ToolWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedA concrete step-by-step guide to moving your development workflow from Cursor to Claude Code - settings, rules, keybindings, and the habits that transfer.
Getting Started
A $500M accidental Claude bill and an open-weights model beating GPT-5.5 at one-sixth the cost point to the same conclus...

Factory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their ar...

A code-heavy field guide to model routing. Real, runnable-style configs for tiering tasks by complexity, routing simple...

A hands-on, beginner-friendly walkthrough of building an AI agent with Vercel eve: scaffold the project, define an agent...

At its Compile conference, Cursor announced Origin: a Git-compatible code hosting platform designed around AI agents as...

At Vercel Ship 26 in London on June 17, 2026, Vercel shipped a wave of agent-era tooling: the open-source eve agent fram...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.