
TL;DR
Five worked examples showing how the new Developers Digest products plug into each other. Real agent filesystems, auto-snapshots, gated skill libraries, eval suites, and a recursive MCP host.
The DD product line stopped being a pile of standalone tools a few weeks ago. Once agentfs landed, the rest of the stack started snapping into it like puzzle pieces. This post walks through five recipes that show how the products compose. Each one is something you can actually wire up today, not a pitch deck diagram.
The pattern across all five: small, sharp tools that speak the same protocols (MCP, hooks, plain JSON on disk), so chaining them together does not require glue code.
Stack: agentfs + agentfs-mcp + Claude Code
For the broader MCP map, pair this with What Is MCP (Model Context Protocol)? A TypeScript Developer's Guide and The Complete Guide to MCP Servers; those pieces cover the concepts and server-selection layer behind this article.
The default model of agent state is a pile of context window plus whatever the harness happens to remember between sessions. That falls apart the moment your agent runs longer than a single conversation, or you want two agents to share work, or you need to come back tomorrow and pick up where you left off.
agentfs is a content-addressed filesystem with branch and snapshot semantics. agentfs-mcp exposes it over MCP so any compatible agent can read and write. Claude Code is the harness.
Wire it up:
agentfs init my-agent-workspace
agentfs-mcp serve --workspace my-agent-workspace --port 7331
Add to .claude/mcp.json:
{
"mcpServers": {
"agentfs": {
"command": "agentfs-mcp",
"args": ["client", "--port", "7331"]
}
}
}
Now when Claude Code writes a file, it writes through agentfs. The agent gets read, write, list, branch, and snapshot tools. The state survives restarts, can be diffed, and can be branched off for parallel exploration. The agent does not have to know any of that. It just sees a filesystem.
The payoff: long-running agent runs that span days. Crash recovery without losing work. The ability to point a fresh agent at a workspace and have it pick up the thread.
A note on performance. agentfs is content-addressed, so writing the same file twice costs almost nothing. Branching is metadata-only. We have run workspaces with 50k files and tens of thousands of snapshots without measurable slowdown on read or write. The cost model is roughly that of a local git repo, with the snapshot operation being closer to free.
Stack: Hookyard agentfs-checkpoint hook + agentfs
Snapshots are only useful if you actually take them. Asking the agent to remember to snapshot is the same mistake as asking humans to remember to git commit. The fix is automation at the harness layer.
Hookyard ships an agentfs-checkpoint hook. It runs on every PostToolUse event for Write, Edit, and MultiEdit, and writes a snapshot to the active agentfs branch with the tool call as the message.
Drop it in:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{ "type": "command", "command": "hookyard run agentfs-checkpoint" }
]
}
]
}
}
Every file edit becomes a checkpoint. If the agent goes off the rails three hours into a run, you can agentfs log, find the checkpoint right before the bad turn, and agentfs reset to it. No more blowing away an entire session because of one wrong edit.
There is one knob worth tuning: snapshot rate on a busy run can produce hundreds of checkpoints. Set HOOKYARD_AGENTFS_DEBOUNCE=30s if you want coarser granularity.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Apr 28, 2026 • 9 min read
Apr 28, 2026 • 6 min read
Apr 28, 2026 • 9 min read
Apr 28, 2026 • 8 min read
Stack: Skills marketplace + dd-pr skill
The skills marketplace launched this week. The dd-pr skill landed alongside it. Together they solve a problem most teams hit by month two of running coding agents: skills proliferate, half are wrong, and there is no review gate before a skill ships to everyone's harness.
Here is the workflow:
~/.claude/skills/ directory.claude /dd-pr "publish skill my-skill". It branches, pushes, and opens a private PR against the team's skills repo.claude-skills sync and pull the new skill.The marketplace handles discovery and versioning. dd-pr handles the gate. Neither tool is interesting on its own. Together they turn an ungoverned mess into a curated library where every skill in production has been read by at least one other person.
The marketplace also supports private orgs, so you can ship internal skills (database migrations, deploy runbooks, ticket triage) without making them public.
One more thing the dd-pr skill does that matters here: it tags the review automatically. Our convention is to tag @devin-ai-integration on every skill PR for a first-pass read. Devin catches the obvious problems (missing frontmatter, broken script paths, accidentally checked-in secrets) before a human reviewer ever sees the PR. By the time a teammate opens the diff, it is usually mergeable.
Stack: agent-eval-bench + agentfs
Evals are the part of agent development that everyone knows they should do and most people skip. The friction is not the eval logic. It is the storage. You end up with a folder of JSON files on someone's laptop and no way for the agent itself to read its own scoreboard.
agent-eval-bench writes eval suites and results as JSON. agentfs is a filesystem that agents can natively read. Point one at the other.
agent-eval-bench run \
--suite suites/coding.yaml \
--output agentfs://eval-results/$(date +%Y-%m-%d)/coding.json
Every run lands in agentfs at a predictable path. Now the agent can read its own eval history:
> read eval-results/2026-04-27/coding.json
{
"suite": "coding",
"score": 0.84,
"regressions": ["test_async_iter", "test_unicode_path"],
...
}
This unlocks a whole class of self-improvement loops. The agent can compare last week's run to this week's, find regressions, and propose fixes. Or you can run a meta-agent that watches for score drops and opens a private PR with a hypothesis.
The shared substrate is the trick. Both tools speak JSON to the same filesystem, so no integration work was needed.
Stack: mcpaas + agentfs-mcp
This one is a little recursive but it is genuinely useful in production.
mcpaas is a hosted runtime for MCP servers. You give it a server binary and a config, it gives you a URL. agentfs-mcp is the MCP server that exposes agentfs.
You can run agentfs-mcp inside an agentfs workspace, hosted by mcpaas. The server's own code, logs, and runtime state live in the same filesystem it is exposing. The setup looks like this:
agentfs init mcpaas-runtime
agentfs cp $(which agentfs-mcp) mcpaas-runtime:/bin/agentfs-mcp
mcpaas deploy \
--workspace mcpaas-runtime \
--binary /bin/agentfs-mcp \
--args "serve --workspace ."
Three things this gets you. First, the MCP server's own state is snapshotted by the same hook chain you use for everything else. If a bad deploy corrupts the server, you roll back with agentfs reset. Second, the server can read its own source code and config, which makes self-updating servers tractable. Third, you can branch the entire runtime to test a config change, point an agent at the branch, validate, then merge.
It sounds cute until you have run a production MCP server for a month. Then it sounds like the only sane way to do it.
A short list of the design choices that made these recipes possible.
One protocol per surface. MCP for tool calls, hooks for lifecycle events, plain JSON files for shared state. No bespoke RPC.
Files as the universal interchange. agentfs is the substrate. Every tool that produces structured output writes JSON to a path. Every tool that consumes structured input reads JSON from a path. The agent does not need adapters.
Private by default. Skills, repos, deploys all default to private. You opt in to public, never the other way around.
Hooks are first class. Hookyard treats hooks like packages. You install them, version them, and chain them. This is how Recipe 2 stays a one-liner.
The cookbook is going to keep growing. A few combinations on the short list that are not shipped yet:
If you want to build any of these, the repos are all up. Small, sharp tools. Compose them.
The full DD stack is at developersdigest.com. Each product has its own docs and a private repo for issues. Email if you want access.
Read next
Chrome 149 ships an origin trial for WebMCP - a proposed web standard that lets developers expose JavaScript functions and HTML forms to AI agents. Here is what it does, how to implement it, and why it matters for the future of agentic browsing.
8 min readOpenClaw has 247K stars and zero MCPs. The best tools for AI agents aren't new protocols - they're the CLIs developers have used for decades.
8 min readAnthropic's Stainless acquisition is not just an SDK deal. It is a bet that agents need generated SDKs, CLIs, docs, and MCP servers from the same source of truth.
8 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
AI app builder - describe what you want, get a deployed full-stack app with React, Supabase, and auth. No coding requi...
View ToolStackBlitz's in-browser AI app builder. Full-stack apps from a prompt - runs Node.js, installs packages, and deploys....
View ToolFull-stack AI dev environment in the browser. Describe an app, get a deployed project with database, auth, and hosting....
View ToolFrontend stack for agent-native apps. React hooks, prebuilt copilot UI, AG-UI runtime, frontend tools, shared state, and...
View ToolConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentChrome 149 ships an origin trial for WebMCP - a proposed web standard that lets developers expose JavaScript functions a...
Headroom is a context compression layer that intercepts your AI agent's tool outputs and strips 60-95% of the tokens bef...

Before an AI agent gets tools, files, APIs, MCP servers, or deployment access, decide what it can read, write, call, log...

Prompt injection stops being an abstract LLM risk once an agent can call tools. The practical defense is data boundaries...

Anthropic's knowledge-work plugin repo is trending because it packages skills, connectors, slash commands, and sub-agent...

Anthropic's Stainless acquisition is not just an SDK deal. It is a bet that agents need generated SDKs, CLIs, docs, and...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.