65 items
61 posts, 4 guides
Forge hit the Hacker News front page with a strong claim: small local models can become much more useful at tool-calling when the harness catches structural failures, retries intelligently, and controls context.
Anthropic's Stainless acquisition is not just an SDK deal. It is a bet that agents need generated SDKs, CLIs, docs, and MCP servers from the same source of truth.
Persistent memory for coding agents is trending because every session still starts too cold. The hard part is not saving facts. It is proving recall, freshness, deletion, and rollback under real development pressure.
Claude Platform on AWS matters because it moves agent adoption into identity, billing, commitments, and platform controls. That is where enterprise AI work gets real.
Thinking Machines' interaction-models post points at a useful shift for developer tools: stop designing around single chat turns and start designing around shared work.
The TanStack npm incident was not just a package-security story. It was a reminder that AI agent workflows inherit every weak trust boundary in CI.
Claude Managed Agents now have multiagent sessions, outcomes, webhooks, and vault events. The practical takeaway is not just better agents. It is that agent runs need backend job discipline.
31 deployed apps. 7 down. Favicons missing on 20 of 24 reachable hosts. Sentry on zero. Here is how a single audit turned into 58 PRs in one afternoon - and what shipped, what didn't, and what the pattern was.
Notes from a single session running 200+ Claude Code subagents in parallel across 35 repos. What worked, what broke, and the patterns I codified into a skill so the recipe replays.
Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries. Here is the practical playbook.
OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, automations, and repeatable knowledge work.
Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs, CI, deploys, and feedback streams.
Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager for real software work.
Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and execution environments, then return compact summaries and receipts.
Manual approval prompts stop protecting users when coding agents ask too often. The better pattern is risk-aware autonomy: safe defaults, narrow deny rules, and approvals only for meaningful changes.
Claude Code is turning into an orchestration layer for agent teams. Here is how subagents, MCP, hooks, and long context fit together in 2026.
A Show HN PDF form demo points at a bigger architecture shift: keep sensitive documents local, expose narrow browser tools to the model, and make AI assistance inspectable.
A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.
A long-form technical read on Flue from Fred K Schott, with deeper comparisons against OpenAI Agents, Vercel AI SDK, Google ADK, LangChain, Deep Agents, and CrewAI, plus practical production patterns.
A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state, verify behavior, limit cost, and recover from failure.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.