
TL;DR
The second half of our agent tooling release: distribution, validation, and ergonomics layered on top of the first six. Six small CLIs, one through-line.
A couple of weeks ago we announced ten internal tools we were spinning out of the DD portfolio. The first six got their own write-ups: SkillForge CI and Cost Tape in the small devtools post, MCP Lens in the MCP servers roundup, TraceTrail in the local OTel post, Hookyard in the hooks explainer, and PromptLock in the prompt injection post.
Those six were about authoring, observing, and securing agent work. Useful, but only half the picture. The other half is what happens around the agent: how you measure whether it actually works, how the work it produces gets seen, and how the daily ergonomics feel when you live inside Claude Code for eight hours.
That is what these next six fill in. Validation, distribution, and ergonomics. None of them are glamorous. All of them solve a problem we kept hitting often enough that writing the tool was cheaper than tolerating the friction. They are private to the DD portfolio for now, but the patterns travel.
If you missed the first announcement, the short version is: we run a lot of small Next.js apps, a lot of Claude Code skills, and a lot of MCP servers. Tooling has to scale to that. Here is the second half.
Twenty-four small apps, all supposedly wired to the same GA4 property. In practice, half of them have hardcoded measurement IDs in three different files, two are missing the snippet entirely, and one is sending page views but no custom events. You only notice when the dashboard goes quiet.
dd-ga audits a Next.js repo (or a glob of them) for GA wiring drift. It looks for hardcoded IDs that should be env vars, missing snippets, inconsistent helper paths, and instrumentation that captures page views but no conversion events. Output is human-readable by default, JSON when you pipe it into something else.
git clone git@github.com:developersdigest/dd-ga.git
cd dd-ga && node bin/dd-ga.js --help
$ node bin/dd-ga.js audit-all '/Users/j/Developer/dd-*' --quiet
24 repos audited, 7 with findings, 2 errors (missing-snippet)
Exit code is non-zero on error-severity findings, so it slots into a pre-commit hook or a nightly cron without ceremony.
Next.js only. The rules are tuned for our App Router conventions. If you put your GA helper somewhere weird it will probably miss it. Treat it as a sanity check, not a compliance tool.
Every time a video ships, we paste the same URL into six places: YouTube description, X, LinkedIn, Threads, Bluesky, the newsletter. Each one wants a slightly different UTM tag. Done by hand, half the campaigns end up untagged or, worse, tagged inconsistently enough that the analytics view is useless.
dd-utm is a CLI that builds canonical UTM links from templates. Each template encodes the source, medium, and default content slug for one distribution channel. You pass a URL and a campaign slug; it spits out the tagged URL and copies it to your clipboard.
cd dd-utm && npm link
$ dd-utm https://devdigest.tv/blog/prompt-versioning-with-promptlock --template youtube --campaign launch-promptlock
https://devdigest.tv/blog/prompt-versioning-with-promptlock?utm_source=youtube&utm_medium=video&utm_campaign=launch-promptlock&utm_content=description
Templates ship for youtube, x, linkedin, threads, bluesky, and newsletter. Free-form mode is there too if you need a one-off.
There is no link shortener, no click tracker, no dashboard. It is a string-builder. The whole point is that it does one thing and does not require a browser tab open to a SaaS.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
Subagent files (.claude/agents/*.md) are simple markdown with frontmatter, but the rules for that frontmatter are picky. Kebab-case names, single-line descriptions, comma-separated tool lists, optional model and isolation fields. Hand-writing them works fine until you are designing your tenth one and you keep typoing the description into a multi-line block.
Subagent Studio is a small Next.js app with a form-based editor on the left and a live markdown preview on the right. Three starter templates: research, code-reviewer, test-writer. The render contract is a single pure function (renderSubagent) so the preview is exactly what gets written. Studio never touches your real ~/.claude/agents/ directory; you copy or download.
git clone git@github.com:developersdigest/subagent-studio.git
cd subagent-studio && pnpm install && pnpm dev
Open localhost:3000, pick the research template, edit name and description, toggle the isolation field, copy the rendered markdown into your repo. Frontmatter validation runs on every keystroke so invalid agents never make it out.
Read-only relative to your filesystem. By design. If you want it to write directly to your skills repo, that is a fork away, but the safer default has saved us from a few accidental overwrites.
Most agent eval setups assume you want a Hugging Face leaderboard. We just want to know whether a prompt change broke our extraction skill before we merge it. YAML in, JSON and a markdown report out, run it in CI.
aeb runs YAML-defined eval suites concurrently against Claude (or Codex, or OpenAI). Cases have a prompt, an optional system message, and a list of scorers: contains, regex, or a small judge prompt. Reports come out as JSON for diffing and markdown for skimming.
pnpm install && pnpm build
export ANTHROPIC_API_KEY=...
$ aeb run examples/basic-suite.yaml --model claude-sonnet-4-6 --concurrency 4 \
--report report.json --markdown report.md
ran 12 cases, 11 passed, 1 failed (regex-capital), 4.2s wall
No cost tracking yet (Cost Tape covers that next door). The judge scorer is intentionally small; if you want full LLM-as-judge with rubrics, this is not it. We use it to catch regressions, not to publish papers.
MCP servers are mostly stdio. Cloud agent clients (claude.ai, Cursor cloud) need an HTTPS endpoint. The gap between "I wrote an MCP server in an afternoon" and "agents on the internet can call it" is annoyingly large.
MCPaaS is a single-tenant scaffold that spawns any stdio MCP server as a child process and exposes JSON-RPC over POST /api/rpc. Bearer token auth, a small dashboard, deploys to a $5 box. Point it at npx -y @modelcontextprotocol/server-filesystem /tmp and you have an HTTP MCP server.
cp .env.example .env # set MCPAAS_SERVER_CMD, ARGS, TOKEN
npm install && npm run dev
curl -s http://localhost:3000/api/rpc \
-H "authorization: Bearer $MCPAAS_TOKEN" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'
Single-tenant on purpose. One server per deploy. No multiplexing, no per-tool rate limits, no usage metering. If you are running an MCP marketplace this is not the substrate; if you have one MCP server you wrote and want it on the internet by lunch, it is.
Every blog post, every video thumbnail, every social share for a tool repo wants the same set of stats arranged the same way: stars, top languages, top contributors, last commit, top issues. Doing it by hand once is fine. Doing it across two dozen DD repos every time something ships is not.
repo-postcard takes an org/repo, hits the GitHub API, and renders a 1200x750 PNG postcard via Satori. Cream background, ink text, pink accent: DD palette, no gradients. Markdown summary mode is there too if you want to embed the same data in a README.
pnpm install && pnpm build
export GH_TOKEN=ghp_...
$ node dist/cli.js developersdigest/mcp-lens --out postcard.png
wrote postcard.png (1200x750, 84KB)
Public repos only in practice. Private repo data works if your token has scope, but the visual layout assumes things like contributor avatars are public. Web UI, theming, and batch mode are deferred; the CLI core is solid.
These six tools, plus the first six, are not a framework and not a platform. They are a stack of small frictions removed.
Look at the shape, though. The first six were inward-facing: authoring skills (SkillForge), inspecting MCP traffic (MCP Lens), tracing agent calls (TraceTrail), running hooks (Hookyard), defending against prompt injection (PromptLock), and watching spend (Cost Tape). Make the agent do the right thing.
The second six face outward and around. Validation: agent-eval-bench tells you whether a change made things better or worse. Distribution: dd-utm and repo-postcard make sure work that ships actually gets seen and tracked. Infrastructure ergonomics: dd-ga keeps shared analytics honest across a portfolio, mcpaas turns an afternoon's MCP server into something a cloud agent can hit, and subagent-studio makes the per-day act of designing agents pleasant instead of fiddly.
The pattern we noticed writing all twelve: the painful part of running agents at any scale is rarely the model call. It is the connective tissue around the model call. Did the change regress? Did the link get tagged? Did the snippet ship? Did the MCP server get a real URL? That is where days disappear, and that is what these tools claw back.
If you want to dig in, every tool has an entry on the comparison page with install commands and a short demo. The repos are private for now while we shake out the rough edges. If one of them solves a problem you also have, ping us and we will prioritize the public release.
The next post in this series is the one we keep putting off: an honest retro of which of these twelve we still use every week, six months in. Some will not survive the cut. That is the point of building twelve small tools instead of one big one.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolMulti-agent orchestration framework. Define agents with roles, goals, and tools, then assign them tasks in a crew. Pytho...
View ToolOpen-source terminal coding agent from Moonshot AI. Powered by Kimi K2.5 (1T params, 32B active). 256K context window. A...
View ToolConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsConfigure model, tools, MCP, skills, memory, and scoping.
Claude CodeRoute specific MCP servers only to specific subagents.
Claude Code
Composio: Connect AI Agents to 1,000+ Apps via CLI (Gmail, Google Docs/Sheets, Hacker News Workflows) Check out Composio here: http://dashboard.composio.dev/?utm_source=Youtube&utm_channel=0426&utm_...

Auto Agent: Self-Improving AI Harnesses Inspired by Karpathy’s Auto-Research Loop The video explains self-improving agents and highlights Kevin Guo’s Auto Agent project as an extension of Andrej Karp...

Check out Replit: https://replit.com/refer/DevelopersDiges The video demos Replit’s Agent 4, explaining how Replit evolved from a cloud IDE into a platform where users can build, deploy, and scale ap...

Agent runs are opaque. TraceTrail turns a Claude Code JSONL into a public share link with a stepped timeline of messages...

Claude Code hooks are powerful but discovery and install is a manual JSON-paste exercise. Hookyard is a directory plus C...

Most MCP servers are noise. After shipping 24 apps with Claude Code, these are the five I reach for every time.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.