45+ terms explained clearly. Written for developers building with AI, not researchers reading papers.
A development workflow where AI agents write, test, and iterate on code autonomously. Instead of suggesting completions, the agent plans multi-step tasks, runs commands, reads errors, and fixes them in a loop until the job is done.
Software that uses a large language model to reason about goals, break them into steps, call external tools, and act on results without human intervention at each step. Agents differ from chatbots because they maintain a plan and execute it across multiple turns.
A secret string that authenticates your application with an external service. AI providers like OpenAI, Anthropic, and Google issue API keys so your code can send prompts and receive completions programmatically.
Running a coding agent in a mode where it completes entire features without pausing for human approval. The agent reads the codebase, writes code, runs tests, and commits - all from a single prompt.
Standardized tests that measure model performance on tasks like code generation, math, reasoning, and instruction following. Common benchmarks include SWE-bench, HumanEval, MMLU, and GPQA. They help developers compare models, but real-world performance often differs from benchmark scores.
Software that compiles, bundles, or transforms source code into production-ready output. In the AI dev space, build tools like Turbopack, Vite, and esbuild handle TypeScript compilation, module bundling, and hot reload for the frameworks that power AI applications.
A markdown file placed in your project root that configures Claude Code's behavior. It defines project rules, coding conventions, file structure, and custom instructions that persist across sessions - acting as a project-specific system prompt for your AI coding agent.
Anthropic's official CLI tool for agentic coding. It runs in your terminal, reads your codebase, edits files, runs commands, and iterates on tasks autonomously. It uses CLAUDE.md for project config and supports sub-agents, hooks, and MCP integrations.
A text-based interface for interacting with software by typing commands. Many AI coding tools - Claude Code, Gemini CLI, Codex - run as CLIs because they integrate directly into the developer's terminal workflow alongside git, npm, and other standard tools.
The maximum amount of text (measured in tokens) that a model can process in a single request. Larger context windows let agents read more code at once. Modern models range from 128K to over 1M tokens, but effective use of context still matters more than raw size.
An AI agent that can perform extended, multi-step research or coding tasks by spawning sub-tasks, searching the web, reading documents, and synthesizing findings. Deep agents run for minutes or hours rather than seconds, tackling problems too complex for a single prompt-response cycle.
Numerical vector representations of text that capture semantic meaning. Similar concepts have vectors that are close together in high-dimensional space. Embeddings power semantic search, RAG systems, and recommendation engines by letting you find related content without exact keyword matches.
Serverless functions that run on CDN nodes close to the user rather than in a central data center. Platforms like Vercel and Cloudflare Workers use edge functions to reduce latency for API routes, middleware, and AI inference endpoints.
The process of training a pre-existing model on a custom dataset to specialize its behavior. Fine-tuning adjusts model weights for specific tasks - like classifying support tickets or generating code in a particular framework - without training from scratch.
A model capability where the LLM outputs structured JSON describing which function to call and with what arguments, rather than plain text. This lets AI applications reliably trigger actions like database queries, API calls, or tool use based on natural language input.
AI systems that create new content - text, images, code, audio, video - rather than just classifying or analyzing existing data. LLMs like Claude and GPT are generative models trained to predict and produce sequences of tokens based on input prompts.
The practice of structuring web content so AI models cite and surface it in their responses. While SEO targets search engine rankings, GEO targets AI-generated answers by using clear definitions, structured data (JSON-LD), and authoritative formatting that models can extract.
Connecting a model's responses to verified, external data sources rather than relying solely on its training data. Grounding techniques include RAG, tool use, and web search - they reduce hallucinations by giving the model facts to reference instead of generating from memory alone.
When a model generates confident-sounding information that is factually incorrect or fabricated. Hallucinations happen because LLMs predict plausible-sounding text, not verified facts. Techniques like RAG, grounding, and structured output help reduce but do not eliminate hallucinations.
User-defined shell commands that run automatically at specific points in the Claude Code lifecycle - before a tool executes, after a tool completes, or when a notification fires. Hooks let you enforce project rules, run linters, or trigger custom workflows without modifying the agent itself.
A software application that combines a code editor, debugger, terminal, and tooling into one workspace. VS Code, Cursor, and Zed are popular IDEs. AI coding assistants increasingly integrate into IDEs or replace them entirely with CLI-based workflows.
The process of running input through a trained model to get a prediction or output. When you send a prompt to an API and get a response, that is inference. Inference cost, speed, and latency are key factors when choosing between AI providers and models.
A standardized format for embedding structured data in web pages using JSON syntax within a script tag. Search engines and AI models read JSON-LD to understand page content - common schemas include Article, FAQPage, DefinedTermSet, and BreadcrumbList.
A neural network trained on massive text datasets that can generate, summarize, translate, and reason about language. Models like Claude, GPT, Gemini, and Llama are LLMs. They power chatbots, coding agents, search tools, and most modern AI applications.
A parameter-efficient fine-tuning method that trains a small set of adapter weights instead of modifying the full model. LoRA makes fine-tuning practical on consumer hardware and is widely used in the open-source model community for creating specialized model variants.
An open protocol created by Anthropic that standardizes how AI applications connect to external data sources and tools. MCP servers expose resources, tools, and prompts through a common interface so any MCP-compatible client can use them without custom integration code.
Architectures where multiple AI agents collaborate on a task, each handling a specialized role. One agent might research while another writes code and a third reviews it. Multi-agent patterns include orchestrator-worker, pipeline, and swarm topologies.
AI models that can process and generate more than one type of data - text, images, audio, video, or code. A multi-modal model can analyze a screenshot, read the text in it, and generate code that reproduces the UI, all in a single interaction.
AI models released with publicly available weights that anyone can download, run, fine-tune, and deploy. Models like Llama, Qwen, Mistral, and DeepSeek offer alternatives to closed APIs, enabling local inference, customization, and full control over your AI stack.
The coordination layer that manages how AI agents, tools, and data sources work together in a pipeline. Orchestration handles routing prompts to the right model, managing context across steps, retrying failures, and combining results from parallel sub-tasks.
The practice of designing and iterating on prompts to get consistent, high-quality outputs from AI models. Good prompt engineering involves clear instructions, examples (few-shot), structured output formats, and systematic testing - not guesswork.
A company or service that hosts AI models and exposes them through an API. Major providers include Anthropic (Claude), OpenAI (GPT), Google (Gemini), and Meta (Llama via third parties). Frameworks like the Vercel AI SDK abstract over providers so you can switch models without rewriting code.
A pattern that improves LLM responses by retrieving relevant documents from an external knowledge base and injecting them into the prompt before generation. RAG gives the model up-to-date, domain-specific context without fine-tuning, reducing hallucinations and keeping responses grounded in real data.
Models specifically trained or prompted to show their step-by-step thinking process before producing a final answer. Models like o1, o3, and Claude with extended thinking use chain-of-thought reasoning to tackle complex math, logic, and coding problems more reliably.
Delivering model output token-by-token as it is generated rather than waiting for the full response. Streaming improves perceived latency in chat interfaces and coding agents. The Vercel AI SDK and most provider SDKs support streaming responses out of the box.
Constraining a model to respond in a specific format - typically JSON matching a defined schema. Structured outputs eliminate parsing failures and make AI responses reliable enough to pipe directly into application logic. Zod schemas are commonly used to define the expected shape.
Lightweight AI agents spawned by a parent agent to handle a specific sub-task in parallel. Claude Code uses sub-agents (via the Task tool) to divide work - one sub-agent researches while another writes code - then the parent synthesizes the results.
Hidden instructions prepended to every conversation that define an AI model's behavior, personality, and constraints. System prompts set the rules that the model follows - like response format, tone, and what topics to avoid. CLAUDE.md files serve a similar purpose for coding agents.
A parameter (typically 0 to 2) that controls how random or creative a model's output is. Low temperature (0-0.3) produces focused, deterministic responses ideal for code generation. High temperature (0.7-1.5) produces more varied, creative outputs better for brainstorming.
The basic unit of text that LLMs process. A token is roughly 3-4 characters or about 0.75 words in English. Models have token limits for input (context window) and output (max completion). API pricing is typically measured per million tokens.
A model capability where the LLM can invoke external tools - running code, searching the web, reading files, calling APIs - as part of generating a response. Tool use turns a passive text generator into an active agent that can interact with the real world.
A database optimized for storing and querying high-dimensional vectors (embeddings). Vector databases like Pinecone, Weaviate, and pgvector enable fast similarity search, powering RAG pipelines, semantic search, and recommendation systems at scale.
A development approach where you describe what you want in natural language and let an AI agent handle the implementation details. The developer focuses on the overall direction and feel of the project rather than writing every line. The term was coined by Andrej Karpathy.
A git feature that Claude Code uses to run multiple agents on separate branches simultaneously without conflicts. Each worktree is a separate working directory linked to the same repository, letting agents work on different features in parallel and merge results back.
A TypeScript-first schema declaration and validation library. In AI development, Zod defines the shape of structured outputs from LLMs, validates API payloads, and ensures type safety at runtime. It is the standard schema tool in the Vercel AI SDK and most TypeScript AI stacks.
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.