One Endpoint, Every Capability: A Reference Architecture for Progressive Disclosure

TL;DR
Skills, files, memory, and generation do not need four integrations. They need one MCP endpoint with tiered disclosure, one API key that scopes everything to its owner, and one credit balance. The same tools answer to an MCP client, an in-product chat, and a CLI. Here is the whole architecture, and why it is the shape that makes a fleet of agents coherent.
Two earlier posts built up one idea in stages. The first argued that SKILL.md and the Model Context Protocol solve two halves of the same problem, and that the useful move is to serve skills over MCP so an agent pays context cost only in proportion to what a task needs. The second removed two constraints from that design: skills no longer had to be ours, and a skill file no longer had to be copied in ahead of time. A file could be a link, fetched only at the moment an agent reached for it.
This post is the capstone. It is not a new feature so much as the shape the whole platform settled into once those pieces were in place. The claim is narrow and, I think, useful: skills, files, memory, and generation do not need four separate integrations. They need one endpoint, one auth surface, one billing surface, and one organizing principle applied consistently across all of them. That principle is tiered disclosure. What follows is the architecture and the reasoning, because the architecture is the interesting part, not any single tool.
The one endpoint
Everything a member's agents can do lives at a single streamable HTTP MCP endpoint: /api/mcp. Point any MCP-capable client at that URL with a dd_live_ API key and the tools appear. There is no second endpoint for skills, no separate service for files, no different auth for generation. The full catalog is documented in the repo as a canonical reference, but the shape is easy to hold in your head, because it is four families of capability on one surface.
The first family is generation: generate_image and generate_voice. These are the metered tools, and they are the only ones that cost credits. Each one does the work, persists the result to the caller's gallery, and hands back a durable URL, so a generation is not a throwaway artifact but a file that now exists in the account.
The second family is files and assets: list_folders, list_files, get_file, and list_assets. This is where everything a member uploads or generates becomes reachable as context. An agent can list what is there and pull one file's contents on demand.
The third family is memory: save_memory, list_memories, and search_memories. Durable notes and links that survive across sessions and machines, so an agent can persist a decision in one run and recall it in the next, on a different computer, weeks later.
The fourth family is the library: list_skills, get_skill, get_skill_file, plus the sibling tools for copyable subagent definitions and design contracts. This is the skills-over-MCP surface the earlier posts built, now including a member's own authored skills scoped to their key.
Four families, one endpoint. The reason that consolidation matters is not tidiness. It is that a single endpoint with a single key is the difference between an agent that can reach your whole working context and an agent that can reach whichever one integration you wired up this week.
Tiered disclosure is the organizing principle
The thing that keeps four capability families from collapsing into an unusable wall of tool schemas is that they all follow the same loading discipline. Anthropic's Agent Skills named it for knowledge packaging: progressive disclosure, where the agent sees short descriptions first, pulls a full body only for the item it chose, and reads deeper reference material only as the work demands. We apply that same staging to every family on the endpoint.
For skills it is three tiers. list_skills returns a lean index, a slug and a one-line description each, cheap enough to hold a hundred of them in context. get_skill returns one skill's body plus a manifest of its files, paths and one-line purposes, still no file contents. get_skill_file returns the raw contents of exactly one file, and for a linked file it fetches the remote source at that moment. Three calls, each one paying only for the depth it reached.
For files it is two tiers, because a file is its own unit and needs no manifest in between. list_files is the lean index: id, name, kind, content type, and size, no URLs and no contents. get_file pulls one file on demand, returning the text inline for a textual file, capped so a large file cannot blow the context budget, or a durable URL for a binary. The pattern is identical to skills, just collapsed by one tier because the shape of the data allows it.
Memory bends the rule deliberately, and the exception is worth stating because it clarifies the rule. There is no get_memory item tier; list_memories and search_memories return the full note body inline. That is intentional. Notes are small recall items, and the entire point of memory is one-call recall. Forcing a second fetch to read a note you already found would be disclosure theater, cost without benefit. The discipline is not "always add tiers." It is "pay context in proportion to what the task needs," and for a short note the proportional cost is the whole note.
The anti-pattern this avoids is the flat server: fifty tools whose full schemas load before the agent has decided anything, or a single tool that dumps every file and every skill body in one response. Either one hands the model tens of thousands of tokens describing things the current task will never touch. A small index in front of on-demand fetches gives the same reach at a fraction of the standing cost.
Newsletter
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.
From the archive
The Economics of Agent Fleets: Fable 5 Orchestrators, Sonnet 5 Workers
Jul 1, 2026 • 8 min read
Agents 101: How to Build and Deploy Anything with AI Agents
Jul 1, 2026 • 7 min read
Where Should Your AI Agent Run Code: E2B vs Daytona vs Modal vs Cloudflare vs Vercel Sandbox
Jul 1, 2026 • 7 min read
Text-to-Speech APIs for Developers in 2026: What to Actually Use
Jul 1, 2026 • 8 min read
The same tools, three front doors
Here is the part that turns a tidy API into a coordination substrate. The tools on /api/mcp are not a special MCP-only surface. They are the same capabilities the platform exposes everywhere, reached three ways.
An external agent reaches them over MCP. Point Claude, Cursor, or any MCP client at the endpoint with a key, and list_skills, get_file, and the rest are callable tools the model can choose.
The in-product chat reaches the same capabilities from the inside. When a member talks to the assistant in the dashboard, the model is calling the same underlying functions, routed through the AI SDK's tool-calling machinery. The chat is not a separate implementation of image generation or memory; it is another caller of the one that already exists.
And a script reaches them over plain HTTP. The REST API and the MCP endpoint are two projections of the same credit-metered capabilities, so a CLI or a cron job hits the same functions with the same key that an agent uses interactively.
One capability, three front doors. That is what makes the architecture worth calling a reference architecture rather than a collection of endpoints. A file your agent generates over MCP at 2am is in the gallery your chat can reference at 9am and the CLI can download at noon, because there was only ever one file and one place it lived. The surfaces differ; the substrate does not.
Auth and credits are what make it shared and safe
None of this works as a coordination layer without the two scoping decisions underneath it, and they are almost boring, which is the point.
Every tool call on the MCP transport resolves its owner from the API key. There is no session to manage, because the key is the identity. That resolved owner id scopes everything: list_files returns your files, get_skill includes your authored skills, search_memories searches your notes. One member's agents cannot reach another member's private context, and they do not have to be told not to; the scoping is structural, applied once at the transport boundary rather than re-checked in every tool.
Credits are the other half. A single universal balance meters the paid actions, and because the key maps to a stable owner id, that balance is the same whether the spend comes from the MCP endpoint, the in-product chat, or a script. Buy credits once, spend them from any front door. The free tools, everything in files, memory, and the library, cost nothing, because their cost is storage and lookups, not inference. The metered tools charge from one source of truth so the price shown and the price charged cannot drift.
Put those two together and you have the quiet precondition for a fleet: a shared context substrate that is scoped per owner and billed once, reachable identically from every surface an agent might live on.
Why this is the shape for coordinating agents
The reason I keep returning to this design is that coordinating a fleet of agents is, in practice, a context problem before it is an orchestration problem. Agents do not fail to cooperate because they lack a message bus. They fail because each one holds a slightly different, slightly stale picture of the world, copied onto its disk at a different moment.
A single endpoint with tiered disclosure fixes that at the root. The runbook is a skill, one row in an index until an agent needs it, updated in one place so the whole fleet has the fix on its next get_skill call. The design doc your teammate uploaded is a file any agent can list and pull. The decision one agent recorded is a memory another agent can search. Nobody re-pastes, nobody re-syncs, and nothing drifts, because there is one library and every agent discovers it the same way. When we ran a fleet of agents for a day to rebuild this site, the thing that held the day together was exactly this: shared, verifiable context every agent could reach on the same terms.
That is the whole architecture. Two open standards each solved one half of the problem, and the combination, applied consistently across skills, files, memory, and generation on one endpoint, is the interesting part. You can browse the catalog by hand at /library, read the endpoint reference in the developer docs, and point your own agents at it today. The next post carries the same architecture to member-authored roles in Agent Studio.
FAQ
What is the difference between the MCP endpoint and the REST API?
They are two projections of the same credit-metered capabilities. The REST API is for scripts and servers calling over plain HTTP; the MCP endpoint exposes the same underlying functions as model-callable tools for an agent. Both authenticate with the same dd_live_ key and draw down the same credit balance, so the choice is about which client is calling, not which features are available.
Why put files and memory behind progressive disclosure instead of just returning everything?
Because returning everything spends context on data the current task will never read. A lean index (list_files, list_skills) plus an on-demand fetch (get_file, get_skill_file) lets an agent hold a large working set cheaply and pay full cost only for the one item it opens. The exception is memory notes, which are small enough that returning the body inline is the intended behavior rather than a leak.
How is one member's context kept separate from another's?
Every tool call resolves its owner from the API key at the transport boundary, and that owner id scopes every per-user tool. A caller only ever sees their own files, skills, and memories. Public content, like another member's explicitly public skill, is the documented exception, and it is opt-in.
Can I use this from a harness other than Claude Code?
Yes. MCP is a client-neutral protocol, so any compliant client discovers and calls the tools the same way. The skills themselves are plain SKILL.md markdown, an open format, so nothing about the pattern is tied to one harness.
How do I try it?
Create a dd_live_ API key, point an MCP client at the /api/mcp endpoint with it as a Bearer token, and call the tools. You can also browse the same skill and file catalog by hand at /library, and the full tool reference lives in the docs.
Read next
Skills Delivered Over MCP: Why Progressive Disclosure Is the Missing Piece of Both Standards
SKILL.md solved knowledge packaging with progressive disclosure. MCP solved capability transport but ships flat, context-hungry tool lists. The next shape combines them - an MCP server whose tools are a skill directory, so an agent pays context only for what the task needs. Here is the argument and a working implementation.
9 min readAgent Studio: Authoring the Roles, Not Just the Knowledge
Skills gave an agent what to know. The missing half is what role to play. Agent Studio lets you author subagents next to your skills in one place, serve both over the same MCP endpoint with the same progressive disclosure, browse them over REST and the dd CLI, and publish them to the community under a moderation loop. Here is the design and why the two belong in one studio.
9 min readLinked Context: When a Skill Can Point at the Whole Web
The first version of skills-over-MCP served a fixed first-party catalog. Skill Studio extends it two ways: anyone can author skills that ride the same progressive-disclosure endpoint scoped to their own API key, and a skill file can be a link instead of a copy - a URL whose bytes are only fetched at the moment an agent decides it needs them. Progressive disclosure stops at the skill boundary no longer. It runs out to the open web.
10 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.







