'The Orchestration Is the Product': What Perplexity's Aravind Srinivas Sees That the Model Labs Don't

Q: What does "the orchestration is the product" mean?

It is Aravind Srinivas's framing, given to [Fortune](https://fortune.com/2026/02/26/perplexity-ceo-aravind-srinivas-computer-openclaw-ai-agent/), that the defensible value in AI sits in the layer that decides which model handles which sub-task and how agents coordinate - not in any single model, which he treats as an interchangeable tool.

Q: What is "token value per watt per user"?

It is the metric Srinivas told [CNBC](https://www.cnbc.com/2026/06/03/perplexity-ceo-ai-valuations-computer-agentic.html) he believes will determine AI winners: useful output (token value) normalized by energy (per watt) and per person (per user), balancing accuracy, latency, cost, privacy, and intelligence together.

Official Sources#

Topic	Source
Perplexity Computer: 19 models, $200/mo, launch details	VentureBeat
"The orchestration is the product" + team analogy	Fortune
"Token value per watt per user" and the winning-objective framing	CNBC, Tekedia
Srinivas on model commoditisation and reasoning	20VC / The Twenty Minute VC

There is a quiet assumption underneath most AI strategy: that value flows to whoever trains the best model. Spend the most on compute, win the benchmarks, capture the market. It is a clean story, and the labs have every incentive to keep telling it.

Aravind Srinivas, Perplexity's CEO, is making the opposite bet in public. In February 2026 Perplexity shipped a product called Computer - an agent that does not try to be the smartest model in the room. It tries to be the smartest manager of other people's models. And the way Srinivas describes it amounts to a thesis the model labs have a structural hard time saying out loud: the orchestration is the product, and the model is a tool.

This post takes that thesis seriously and argues it is more right than the consensus credits. It is the standalone companion to our broader argument in why the orchestration layer is the next big play next to the labs. That piece maps the whole landscape. This one zooms in on the most committed bettor in it.

What Perplexity actually shipped#

On February 25, 2026, Perplexity launched Computer, which Srinivas called the most ambitious product in the company's three-year history. The headline number is the tell: Computer coordinates 19 different models on the backend rather than running on a single house model. Per VentureBeat's reporting, the lineup spans Claude Opus 4.6 for orchestration and coding, Google's Gemini for deep research, Google's Nano Banana for images and Veo 3.1 for video, xAI's Grok for fast lightweight tasks, and ChatGPT 5.2 for long-context recall. Computer launched to Perplexity Max subscribers at $200 a month.

Notice what that is not. It is not "we fine-tuned a model and wrapped a UI around it." It is a system whose entire reason for existing is deciding which external model handles which sub-task, and then stitching the results into one coherent piece of work. Srinivas does not even call it a router. He frames it as an orchestrator - or, in his broader public language, an "omni agent" that picks the model, coordinates multiple agents, and decides what runs locally versus in the cloud.

The clearest articulation came in his Fortune interview, where he reached for a hiring analogy:

"When you build a team, you don't build a homogenous group where everyone has the same skills. You build a team with diverse strengths. We're applying that same logic to AI workflows. The orchestration is the product."

Read that last sentence as a strategy statement, not a product description. He is telling you where he thinks the defensible value sits.

Why a lab cannot comfortably say this#

The reason this thesis is interesting is not that it is clever. It is that it is structurally available to Perplexity and structurally awkward for OpenAI, Anthropic, or Google to adopt.

A frontier lab's entire capital story is that its model is the irreplaceable asset. Tens of billions in training compute only pencils out if the model is the moat. A lab that stood up and said "honestly, the model is a commodity tool and the orchestration on top is the real product" would be undercutting its own valuation narrative. So labs route to their own models by default, even when a competitor's model is better for a given sub-task, because every query that leaves their stack is a query that admits the commodity thesis.

Perplexity has no such conflict. It owns no frontier model it must defend, which means it can do the thing users actually benefit from: send each sub-task to whichever lab is genuinely best at it. The 19-model lineup is only possible because Perplexity is indifferent to which lab wins any individual call. That indifference is the product. A lab cannot fake it, because its cap table will not let it.

Srinivas made the commoditisation point directly on Harry Stebbings' 20VC podcast, where the conversation centered on whether foundation models will commoditise and where the next gains in model performance actually come from. If you believe raw model quality is converging - and the open-weights cost curve we cover in our routing recipes guide suggests it is - then the marginal advantage stops living inside any one model and starts living in the layer that decides how to use all of them.

From the archive

RFC 10008: The New HTTP QUERY Method Explained

Jun 17, 2026 • 6 min read

Self-Hosting Open-Weights Models: The Real Break-Even Math

Jun 17, 2026 • 11 min read

Vercel eve: The Framework for Building AI Agents

Jun 17, 2026 • 9 min read

Cursor Automations Developer Guide: Always-On AI Coding Agents

Jun 15, 2026 • 9 min read

"Token value per watt per user"#

The most revealing thing Srinivas has said is not the orchestration line. It is the metric he wants to be judged on.

In a CNBC interview, he argued that the company best able to maximize "token value per watt per user" will command the highest valuation over time. As reported by CNBC and Tekedia, he framed the winning objective this way:

"Whoever is able to maximize this particular objective really will, by balancing accuracy, latency, cost, privacy and intelligence all together, they're going to win, that's what's going to win long term."

Sit with the shape of that metric. It is not tokens per second. It is not benchmark score. It is not parameter count. It is useful output (token value) normalized by energy (per watt) and by person (per user). Every term in it is an orchestration variable, not a model variable:

Accuracy is improved by sending hard sub-tasks to the model that is actually good at them, not by forcing one model to do everything.
Latency and cost are won by not reaching for a frontier model when a cheaper one clears the bar - the core move in any routing recipe.
Privacy is a decision about what runs on-device versus in the cloud, which a single model cannot make for you.
Energy is the constraint that makes "just use the biggest model for everything" a losing strategy at scale.

A lab optimizes for the numerator of one model's capability. An orchestrator optimizes the whole ratio across many models. If Srinivas is right that the ratio is what the market eventually prices, then the orchestration layer is not a thin wrapper. It is where the optimization problem that matters actually lives.

Routing versus orchestration, precisely#

It is worth being exact, because the two words get used interchangeably and they are not the same thing.

Routing optimizes the choice within a fixed shape of work. A request comes in, a policy picks the best model for that single request, the response goes out. Factory's Factory Router is a strong example: it scores models on cost and capability and sends each coding request to the right one. The shape of the work - one request, one answer - is held constant. We break down that cost-per-task dynamic in our piece on Factory AI and the model routing era.

Orchestration optimizes the shape of the work itself. It decides how a task is decomposed into sub-tasks, how many agents run, how they hand off to each other, what executes locally versus in the cloud, and when to call a tool instead of a model at all. Routing is a subroutine inside orchestration - the part that picks a model once the shape is set.

Computer is aiming at the second category. When it takes "research this company and draft a memo," it does not make one model call. It plans, dispatches deep research to one model, image or chart generation to another, drafting to a third, and reconciles the outputs. That is orchestration doing the expensive cognitive work, with routing nested inside each step.

The defensibility case#

Why is this layer defensible rather than a feature a lab bolts on next quarter?

First, neutrality is the moat. The value of orchestrating 19 models comes precisely from being willing to pick a competitor's model when it is better. A lab can build an orchestrator, but it cannot credibly build a neutral one, because its incentives push every borderline call toward its own stack. Users notice. Perplexity's neutrality is a position labs cannot occupy without contradicting their own economics.

Second, the optimization surface is broad and operational, not just algorithmic. Getting token value per watt per user right means continuously tuning model selection against shifting prices, new releases, latency profiles, and privacy constraints. That is a moving operational problem - exactly the kind of work that compounds into a durable product over time rather than a copyable feature. It looks a lot like the seven orchestration patterns maturing into a managed surface.

Third, the layer captures the user relationship. Whoever owns orchestration owns the interface where work actually gets done, which means they own the data about what works, the trust, and the switching cost. The model underneath becomes interchangeable plumbing. That is the precise inversion Srinivas is betting on: the model is the tool, and the thing you actually pay for and depend on is the orchestration.

None of this means the labs lose. Frontier models remain the scarce, expensive ingredient that orchestration depends on - Computer is worthless without good models to coordinate. The argument is narrower and more interesting: that a second, durable layer of value is forming adjacent to the labs, and that the labs are structurally the least able to claim it.

What this means if you are building#

You do not need to ship Perplexity Computer to act on the thesis. The strategic move for a builder is the same one Srinivas made at company scale, shrunk to fit your stack:

Stop defaulting to one frontier model for everything. That default is the expensive habit. Most sub-tasks do not need your most capable model.
Tier your work by difficulty, then route the easy majority to cheaper or open-weights models and reserve the frontier model for the hard minority. Our model routing recipes has runnable-style configs for exactly this with OpenRouter, LiteLLM, and Factory Router.
Treat model choice as neutral. Pick the best model per sub-task regardless of vendor. Vendor loyalty is a cost you pay in quality and dollars.
Measure your own version of the ratio. Useful output per dollar and per second, per task type. Once you can see it, the routing decisions make themselves.

The label on the bet is "orchestration is the product." The practical version is humbler and immediately actionable: the work of deciding which model does what, for which task, under which constraints, is real work that creates real value - and it is increasingly where the margin and the defensibility live. Srinivas built a $200-a-month product on that idea. You can start with a routing config.

FAQ#

What is Perplexity Computer?#

Computer is an AI agent Perplexity launched in February 2026 that coordinates 19 different models on the backend - including Claude, Gemini, Grok, and ChatGPT - to complete multi-step tasks, rather than relying on a single house model. It launched to Perplexity Max subscribers at $200 a month, per VentureBeat.

What does "the orchestration is the product" mean?#

It is Aravind Srinivas's framing, given to Fortune, that the defensible value in AI sits in the layer that decides which model handles which sub-task and how agents coordinate - not in any single model, which he treats as an interchangeable tool.

What is "token value per watt per user"?#

It is the metric Srinivas told CNBC he believes will determine AI winners: useful output (token value) normalized by energy (per watt) and per person (per user), balancing accuracy, latency, cost, privacy, and intelligence together.

How is orchestration different from model routing?#

Routing optimizes the model choice within a fixed shape of work - one request, one answer. Orchestration optimizes the shape of the work itself: how a task is decomposed, how many agents run, what runs locally versus in the cloud, and when to call a tool instead of a model. Routing is a subroutine inside orchestration. See our orchestration layer breakdown for the fuller distinction.

Official Sources#

Topic	Source
Perplexity Computer: 19 models, $200/mo, launch details	VentureBeat
"The orchestration is the product" + team analogy	Fortune
"Token value per watt per user" and the winning-objective framing	CNBC, Tekedia
Srinivas on model commoditisation and reasoning	20VC / The Twenty Minute VC

What Perplexity actually shipped#

The clearest articulation came in his Fortune interview, where he reached for a hiring analogy:

"When you build a team, you don't build a homogenous group where everyone has the same skills. You build a team with diverse strengths. We're applying that same logic to AI workflows. The orchestration is the product."

Read that last sentence as a strategy statement, not a product description. He is telling you where he thinks the defensible value sits.

Why a lab cannot comfortably say this#

The reason this thesis is interesting is not that it is clever. It is that it is structurally available to Perplexity and structurally awkward for OpenAI, Anthropic, or Google to adopt.

From the archive

RFC 10008: The New HTTP QUERY Method Explained

Jun 17, 2026 • 6 min read

Self-Hosting Open-Weights Models: The Real Break-Even Math

Jun 17, 2026 • 11 min read

Vercel eve: The Framework for Building AI Agents

Jun 17, 2026 • 9 min read

Cursor Automations Developer Guide: Always-On AI Coding Agents

Jun 15, 2026 • 9 min read

"Token value per watt per user"#

The most revealing thing Srinivas has said is not the orchestration line. It is the metric he wants to be judged on.

"Whoever is able to maximize this particular objective really will, by balancing accuracy, latency, cost, privacy and intelligence all together, they're going to win, that's what's going to win long term."

Accuracy is improved by sending hard sub-tasks to the model that is actually good at them, not by forcing one model to do everything.
Latency and cost are won by not reaching for a frontier model when a cheaper one clears the bar - the core move in any routing recipe.
Privacy is a decision about what runs on-device versus in the cloud, which a single model cannot make for you.
Energy is the constraint that makes "just use the biggest model for everything" a losing strategy at scale.

Routing versus orchestration, precisely#

It is worth being exact, because the two words get used interchangeably and they are not the same thing.

The defensibility case#

Why is this layer defensible rather than a feature a lab bolts on next quarter?

What this means if you are building#

You do not need to ship Perplexity Computer to act on the thesis. The strategic move for a builder is the same one Srinivas made at company scale, shrunk to fit your stack:

Stop defaulting to one frontier model for everything. That default is the expensive habit. Most sub-tasks do not need your most capable model.
Tier your work by difficulty, then route the easy majority to cheaper or open-weights models and reserve the frontier model for the hard minority. Our model routing recipes has runnable-style configs for exactly this with OpenRouter, LiteLLM, and Factory Router.
Treat model choice as neutral. Pick the best model per sub-task regardless of vendor. Vendor loyalty is a cost you pay in quality and dollars.
Measure your own version of the ratio. Useful output per dollar and per second, per task type. Once you can see it, the routing decisions make themselves.

Official Sources#

What Perplexity actually shipped#

Why a lab cannot comfortably say this#

RFC 10008: The New HTTP QUERY Method Explained

Self-Hosting Open-Weights Models: The Real Break-Even Math

Vercel eve: The Framework for Building AI Agents

Cursor Automations Developer Guide: Always-On AI Coding Agents

"Token value per watt per user"#

Routing versus orchestration, precisely#

The defensibility case#

What this means if you are building#

FAQ#

What is Perplexity Computer?#

What does "the orchestration is the product" mean?#

What is "token value per watt per user"?#

How is orchestration different from model routing?#

AI Model Routing: Why the Orchestration Layer Is the Next Big Play Next to the Labs

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

Related Tools

OpenRouter

Agency Swarm

MCP Hub

Jan

Related Guides

MCP Servers Explained

Run AI Models Locally with Ollama and LM Studio

Migrating from Cursor to Claude Code

Related Videos

TRAE: Custom AI Agents That Actually Understand Your Codebase

OpenAI's New O1 Model and $200/Month ChatGPT Pro Tier: What's New?

Not Diamond: AI Model Routing in 11 Minutes

Related Posts

AI Model Routing: Why the Orchestration Layer Is the Next Big Play Next to the Labs

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

AI's Affordability Crisis Is Really an Agent Cost Accounting Problem

Qwen-UI-Agent Points at the Next GUI Agent Runtime

Stateless MCP Is Here: What the 2026-07-28 Spec Changes and How to Host a Fleet of Servers on One Bun Process

Build with the member tools

Get Smarter About AI Dev

Official Sources#

What Perplexity actually shipped#

Why a lab cannot comfortably say this#

RFC 10008: The New HTTP QUERY Method Explained

Self-Hosting Open-Weights Models: The Real Break-Even Math

Vercel eve: The Framework for Building AI Agents

Cursor Automations Developer Guide: Always-On AI Coding Agents

"Token value per watt per user"#

Routing versus orchestration, precisely#

The defensibility case#

What this means if you are building#

FAQ#

What is Perplexity Computer?#

What does "the orchestration is the product" mean?#

What is "token value per watt per user"?#

How is orchestration different from model routing?#

AI Model Routing: Why the Orchestration Layer Is the Next Big Play Next to the Labs

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

Related Tools

OpenRouter

Agency Swarm

MCP Hub

Jan

Related Guides

MCP Servers Explained

Run AI Models Locally with Ollama and LM Studio

Migrating from Cursor to Claude Code

Related Videos

TRAE: Custom AI Agents That Actually Understand Your Codebase

OpenAI's New O1 Model and $200/Month ChatGPT Pro Tier: What's New?

Not Diamond: AI Model Routing in 11 Minutes

Related Posts

AI Model Routing: Why the Orchestration Layer Is the Next Big Play Next to the Labs

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

AI's Affordability Crisis Is Really an Agent Cost Accounting Problem

Qwen-UI-Agent Points at the Next GUI Agent Runtime

Stateless MCP Is Here: What the 2026-07-28 Spec Changes and How to Host a Fleet of Servers on One Bun Process