DEVELOPER WORKFLOW

66 items

66 posts

BlogJul 16, 2026

Harness Handbook Shows the Missing Map for Coding Agents

A July 2026 paper from Tencent Hunyuan turns agent harnesses into behavior-level maps. The useful lesson for builders is simple: code search is not enough when one behavior spans prompts, tools, state, permissions, and runtime policy.

AI Agents Agent Infrastructure Codex Developer Workflow Evals

BlogJul 15, 2026

SkillHone Shows Why Agent Skills Need Decision History

SkillHone is a July 2026 paper about evolving agent skills across sessions. The useful takeaway for developers is simple: do not save only the latest SKILL.md. Save the decisions that explain why it changed.

AI Agents Agent Skills AI Coding Developer Workflow Evals

BlogJul 14, 2026

Long-Horizon Terminal Bench Shows Why Coding Agents Still Stall

Long-Horizon-Terminal-Bench tests coding agents on 46 terminal tasks that can run for 90 minutes. The takeaway is not that agents are useless. It is that evals need to measure endurance, recovery, and partial progress.

AI Agents AI Coding Evals Benchmarks Developer Workflow

BlogJul 13, 2026

Microsoft's CLI Coding Agent Study: The Rollout Pattern Teams Should Copy

A Microsoft field study found that CLI coding-agent adoption spreads through peers and managers, while adopters merged roughly 24% more pull requests. The lesson is not to buy more seats. It is to instrument rollout, retention, cost, and review quality from day one.

AI Coding Coding Agents Claude Code GitHub Copilot Developer Workflow

BlogJul 12, 2026

Dockerless Verification Is The Next Coding Agent Bottleneck

ByteDance's Dockerless paper asks whether coding-agent patches can be verified without spinning up per-repo environments. The practical answer is not replace CI. It is use cheaper evidence before CI.

AI Agents AI Coding Developer Workflow CI/CD Research

BlogJul 10, 2026

Vera Shows Agent Safety Needs Test Oracles, Not Vibes

A new Vera paper tests Codex, Claude Code, OpenClaw, and Hermes with executable safety cases. The useful lesson is not panic. It is evidence-grounded agent QA.

AI Security AI Agents Codex Claude Code Developer Workflow

BlogJul 5, 2026

Program-as-Weights Turns Prompts Into Local Fuzzy Functions

The Program-as-Weights paper is a useful signal for developers: some LLM calls may move from per-request API prompts into compact local artifacts that behave like reusable fuzzy functions.

AI Coding Local AI LLM Research Developer Workflow

BlogJul 2, 2026

Non-Developers Using AI Agents Need Platform Engineering

OpenAI's workplace agent data points to a practical shift: non-developers are starting to use agents for real work, so engineering teams need paved paths, policy, and receipts.

AI Agents OpenAI Platform Engineering Developer Workflow Enterprise AI

BlogJun 23, 2026

Agent PR Governance: The New Rules for Copilot Reviews

GitHub's June Copilot review updates point to a practical policy stack for agent-authored pull requests: validation, review depth, repo instructions, attribution, and release-note accountability.

GitHub Copilot AI Code Review AI Agents Developer Workflow Governance

BlogJun 23, 2026

Agent Sandbox Architecture: How to Choose the Right Runtime Boundary

AI agents are getting their own computers. Here is how to choose a sandbox architecture: filesystem isolation, network policy, secrets boundaries, snapshots, and when shell access is overkill.

AI Agents Security Agent Infrastructure Sandboxes Developer Workflow

BlogJun 23, 2026

Agent Workflows as Code: Why State Machines Beat Prompt Checklists

Aharness, LangChain's custom harness pattern, and OpenAI's code-first migration all point to the same next step: agent processes need typed gates, validated evidence, and controlled transitions.

AI Agents Codex Agent Infrastructure Developer Workflow TypeScript

BlogJun 21, 2026

Agentic AI Reliability Is a Systems Problem

The Bayer and Thoughtworks PRINCE case study is a useful reminder that reliable agentic AI comes from context routing, traces, evals, monitoring, and human review, not from a better prompt alone.

AI Agents Agent Infrastructure RAG Evals Developer Workflow

BlogJun 20, 2026

The Definitive Guide to Loop Engineering in Claude Code and Codex

Goal, loop, routine. Three verbs, two tools, one hard part. A complete field guide to running agentic loops in Claude Code and Codex, the real commands, the patterns people actually run, and the two failure modes that burn money.

Loop Engineering Claude Code Codex AI Agents Automation Developer Workflow

BlogJun 19, 2026

Zero-Touch OAuth Is the MCP Feature Enterprises Were Waiting For

MCP's new enterprise-managed authorization flow is not just less login friction. It moves agent tool access into identity, policy, and audit systems enterprises already understand.

MCP AI Agents AI Security Developer Workflow Enterprise AI

BlogJun 17, 2026

Cohere's North Mini Code: A 30B Open-Weight Coding Model That Runs on One H100

Cohere shipped its first developer-facing model on June 9, 2026. North Mini Code is a 30B mixture-of-experts coding model with 3B active parameters, Apache 2.0 weights, and a deployment footprint of a single H100. Here is what it actually offers and where the open questions are.

local llm coding tools open source self-hosting ai tools developer workflow

BlogJun 12, 2026

AI Infrastructure Agents Need Spend Guardrails

The viral DN42 AWS bill story is funny until you realize the missing primitive: infrastructure agents need hard cloud-spend guardrails before they touch real accounts.

AI Agents Cloud Developer Workflow Security FinOps

BlogJun 10, 2026

The Best Local Coding LLMs in 2026: Run Enterprise-Grade AI Without the Cloud

Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to keep code off third-party servers. Here is what to run and on what hardware.

local llm coding tools self-hosting privacy ai tools developer workflow

BlogJun 8, 2026

Agent Config Files Are Executable Supply Chain

A Hacker News thread on config files that run code points at the next AI coding risk: agent hooks, skills, and editor rules need review like executable dependencies.

AI Coding Security Agent Skills Developer Workflow Claude Code

BlogJun 7, 2026

Harness Engineering Makes Tokens a Systems Budget

OpenAI's harness engineering post and new token-use research point to the same lesson: agentic coding teams need token budgets, receipts, and eval loops, not vibes.

AI Agents Codex Developer Workflow Agentic Coding AI Coding

BlogJun 6, 2026

AI Code Attribution Needs Defect Forensics, Not Vibes

The rsync Claude debate shows why teams need reproducible defect forensics before AI attribution becomes a public blame machine.

AI Coding Claude Code Open Source Code Review Developer Workflow

Page 1 of 4Next

Browse All Tags

DEVELOPER WORKFLOW

Get Smarter About AI Dev

DEVELOPER WORKFLOW

Get Smarter About AI Dev