Claude Skills: A technical deep dive into Anthropic's new approach to AI context management

Anthropic announced Agent Skills (commonly called Claude Skills) on October 16, 2025, introducing a fundamental shift in how developers extend AI capabilities. Skills are modular folders containing instructions, scripts, and resources that Claude loads on-demand, consuming only 30-50 tokens until relevant to a task. This progressive disclosure architecture solves the persistent context window limitation while enabling organizations to package domain expertise into composable, version-controlled units. Early developer feedback suggests Skills may be “a bigger deal than MCP,” with significant excitement around their simplicity and power for production workflows.


Understanding the context problem Skills solve

LLMs are powerful, but specialized high-quality output has repeatedly hit a wall: context management. AI models need rich context to perform expert tasks, but stuffing system prompts or reference documents into every request quickly becomes unsustainable and brittle. Embedding-based retrieval (RAG) introduces complexity and indirection, while fine-tuning is slow, costly, and often rigid.

Anthropic’s engineering insight: If AI agents could discover and load instructions and resources progressively, context need only be as big as the immediate task requires. Rather than cramming everything into the prompt window, Skills function like a continually-refreshing index of available capabilities. At startup, Claude reads only minimal metadata-names and descriptions-using ~30-50 tokens per skill. When a request matches a relevant skill (using pure LLM reasoning, not pattern-matching), it loads the skill’s full instructions and only then adds any associated scripts, references, or assets, directly from the filesystem. This enables the amount of task-specific knowledge available to Claude to be, for practical purposes, unbounded.

“The amount of context that can be bundled into a skill is effectively unbounded, because agents intelligently navigate filesystems rather than stuffing everything into prompts.”

  • Mahesh Murag, Anthropic technical staff

The payoff: A library of 20 skills consumes only ~1,000 tokens until any skill is loaded, versus tens of thousands for equivalent system prompts. Skill content is versioned, composable, and persists across all sessions, so “copy/paste prompt rot” is replaced by reusable infrastructure.


Technical architecture: how Skills actually work

Skills are implemented as a meta-tool called “Skill” that lives beside other Claude tools like Read, Write, and Bash. Every skill is a folder with a required SKILL.md (YAML frontmatter and Markdown instructions), optional scripts (scripts/), references, and assets.

Technical flow:

  1. Discovery: At chat or agent startup, Claude recursively scans sources:

    • ~/.claude/skills/ (personal),
    • .claude/skills/ (per-project, version-controlled),
    • plugin and built-in skills

    Skills discovered are declared in a lightweight XML list within the tools array: <available_skills><skill name="pdf" .../></available_skills>, keeping context cost minimal.

  2. Selection: When a user message arrives, Claude uses LLM reasoning (not pattern matching or routing logic) to select matching skills based on names/descriptions.

  3. Loading: When a skill is used, two user messages are injected:

    • One transparent to user UI (“Loading ‘pdf’ skill with arguments ...”)
    • One (isMeta: true) long-form message containing the full instructions, examples, and any procedural guidance from the skill
  4. Scoped context modification: Skills can adjust model, tool permissions (e.g., allow Bash(pdftotext:*)), or execution environment with a skill-specific contextModifier-all scoped and temporary, tightly controlling capabilities.

This meta-tool enables stacking, composition, and arbitrary extensibility-Claude can load and coordinate multiple skills in response to complex requests.


Anatomy of a Skill: SKILL.md format and best practices

Every skill contains a SKILL.md with YAML frontmatter and actionable instructions. Example minimal template:

---
name: project-conventions
description: Apply project-specific coding conventions. Use when writing, reviewing, or refactoring code in this project.
---

# Coding Conventions

## Principles
- Use functional React components with hooks for state
- Co-locate tests with components (Button.tsx → Button.test.tsx)
- Types must be declared for all exported props

## Directory Structure

src/
├── components/
├── hooks/
├── utils/
└── types/

## Examples

User: “Refactor dashboard for consistency.”  
*Claude: Applies rules above and outputs PR-ready code changes.*

Frontmatter tips:

  • name is lowercase, 64 chars max, and becomes the skill command/identifier.
  • description is critical: must say both what and when to use (“Generate Excel reports from tabular data. Use when analyzing or exporting Excel files.”)
  • Optional: allowed-tools, model, version, license. Scoping tool permissions is strongly encouraged for security.

Recommended folders:

  • scripts/: Python, Bash-invoked via allowed tools
  • references/: Extra context and documentation (loaded only if referenced)
  • assets/: Templates, binaries by reference

Advanced: Skills can include structured directories for deterministic operations, code generation templates, or API references.


API integration and code patterns

Skills are available through the Claude API, web app, and Claude Code. API usage requires enabling skills beta and (for code execution skills) code-execution beta:

import anthropic
client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=[
        "code-execution-2025-08-25",
        "skills-2025-10-02",
        "files-api-2025-04-14"
    ],
    container={"skills": [
        {"type": "anthropic", "skill_id": "pptx", "version": "latest"}
    ]},
    messages=[{"role": "user", "content": "Create a presentation about renewable energy"}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
  • The container param can specify up to 8 Anthropic or custom Skills per request.
  • Multi-turn conversations reuse the container by ID, maintaining skill inclusion and filesystem state.
  • Built-in Skills cover pptx/xlsx/docx/pdf; custom Skills are uploaded via the Skills Management API and get a generated ID.
  • Skills producing files return file_ids retrievable via the Files API.

Skill upload:

with open('skill-folder/SKILL.md', 'rb') as skill_file:
    response = client.skills.create(
      files=[
          {"path": "SKILL.md", "content": skill_file.read()},
          {"path": "scripts/helper.py", "content": open('skill-folder/scripts/helper.py', 'rb').read()}
      ]
    )
skill_id = response.id

And listing, versioning, or deleting skills is supported via the Management API.


Claude Code: Real-world developer workflows

Claude Code, Anthropic’s agentic IDE/terminal, brings out the true power of Skills for software teams:

  • Discovery: Skills are loaded from personal (~/.claude/skills/), project (.claude/skills/), or plugins-supporting both individual and version-controlled, team-wide patterns.
  • Autonomous activation: When engineers run claude commit, the “generating-commit-messages” skill can trigger, analyze the git diff, and return a perfectly formatted message-no prompt engineering or style remembering needed.
  • Stacking: Multiple skills (testing methodology, linting rules, database integration) compose on the fly as Claude autocompletes tasks, interprets context, or executes migrations.
  • Procedural documentation: Teams package institutional knowledge and SOPs, from bug triage to onboarding checklists, into instantly reusable, discoverable Skill libraries.
  • Vendor and stack patterns: Skills like “google-adk” or “stripe-integration” encode company-approved integration steps, error handling, and best practices.

A real project conventions skill might encode file/folder layout, coding style rules, commit templates, testing requirements, and review checklists-all in readable Markdown.

Example: test-driven-development Skill

---
name: test-driven-development
description: Implement features using test-driven development. Activates when adding features.
---

# Test-Driven Development

## Workflow

1. Write a failing test for new functionality
2. Implement minimal code to pass test
3. Refactor while tests remain green

## Example Test

```typescript
describe('authenticateUser', () => {
  it('returns true for valid credentials', () => {
    const user = { username: 'test', password: 'pw' }
    expect(authenticateUser(user, 'test', 'pw')).toBe(true)
  });
});

---

## Advanced usage: Code, scripts, and deterministic operations

Skills can bundle scripts for tasks requiring precision or speed (e.g., PDF form extraction, data processing):

*pdf-form-extractor skill:*
```markdown
---
name: pdf-form-extractor
description: Extract and analyze form fields from PDFs. Use when working with fillable PDF forms.
allowed-tools: Bash(python:*)
---

# Extraction Steps

1. Ensure PDF is accessible
2. Run extraction: `python {baseDir}/scripts/extract_fields.py "$filepath"`
3. Parse resulting JSON for field analysis

Invoked script:

import PyPDF2, json, sys
def extract_form_fields(pdf_path):
    # Extraction logic here-returns JSON
if __name__ == '__main__':
    print(json.dumps(extract_form_fields(sys.argv[1]), indent=2))

Skills vs. other approaches: prompts, RAG, MCP, and Projects

  • System prompts: Large, brittle, context-hungry and hard to update or version.
  • Skills: Composable, persistent, progressive-you load only what’s needed, when it’s needed, and each unit is versioned/tested separately.
  • RAG: Best for factual retrieval and dynamic, external, fresh content-Skills are best for procedural and repeatable workflows.
  • MCP: Connects Claude to external APIs, servers, live data, but is complex. Skills are radically simpler and more portable; they can teach Claude how to use MCP connections through repeatable workflows.
  • Projects/Context Stuffing: Useful for iterative context accretion, but not persistent, composable, or universally available.

Real hybrid workflows combine a stable short system prompt, high-ROI skills, and RAG for dynamic data.


Developer benefits: from efficiency to consistency

  • Persistence: Skills live across all chats, projects, and API requests-install once, use anywhere.
  • Repeatability: Document once, deploy anywhere-teams save dozens of hours and achieve perfect consistency (e.g., “authentication-setup” skill rolled out across 6 projects with 14 hours saved).
  • Cost savings: Each skill uses ~50 tokens until loaded; even large libraries have negligible context cost until activation, saving on inference cost and latency.
  • Sharing & portability: Skills are git folders-version, distribute, and roll them out across teams or the whole organization.
  • Velocity and onboarding: Skills lower the barrier for new team members, codify best practices, accelerate prototyping, and guarantee higher-quality outputs.

Real-world impact & user stories

  • Engineering teams: 90%+ of git interactions automated via Claude Code and Skills-from commit message generation, bugfix branches, to migration scripts.
  • Productivity: Non-engineers automate workflows (e.g. creating Office docs from templates), consistently apply brand guidelines, or execute complex data analysis.
  • Rapid prototyping: Apps like webcam background removers or Stripe payment integration built in under an hour using pre-written Skills.
  • Emergencies: One user used Skills to research, compose, and coordinate a successful hospital policy appeal in a single evening. Others report hours saved on spreadsheets, reporting, and formatting.
  • Business workflows: Marketing teams process and improve ad creatives using Skills encoding guidelines and optimization recipes.

Security, limits, and best practices

  • Security: Carefully scope tool permissions in allowed-tools-never use wildcards for Bash or network operations in production. Review all community skills before use; don’t install untrusted skills.
  • Description quality: Skill triggering depends on high-quality, specific descriptions. Include task, target file types, and usage triggers (“Use when analyzing .xlsx spreadsheets”).
  • Token cost: While Skills only use ~50 tokens until loaded, activation can inject 1,500+ tokens per turn. Stack skills judiciously and measure cost in large workflows.
  • Version control: Keep SKILL.md focused (<5,000 words), use references/assets/scripts to offload bulk, and test edge cases.
  • Distribution: Use personal ~/.claude/skills/ for experiments, .claude/skills/ for team standards, and marketplace skills (coming soon) for broader distribution.
  • Tool permissions: Only scope Bash and APIs needed for the task at hand. Failsafe by denying excess permission rather than risking security escalation.

What’s next? Future directions for Skills

Anthropic aims to streamline skill creation, introduce centralized management and distribution (enterprise/team skill rollout), and foster an ecosystem for sharing and improvement. Skills may soon orchestrate Model Context Protocol integrations, enabling rich workflows across heterogeneous data sources or APIs using a combination of procedural knowledge (in Skills) and dynamic access (via MCP).

"The Cambrian explosion of Skills will make this year’s MCP rush look pedestrian by comparison."

  • Simon Willison

Teams that invest in building out skill libraries as tested, documented infrastructure-not one-off prompts-will realize the largest benefits: consistency, velocity, onboarding, and quality across every aspect of AI-powered workflows.


Skills don’t just add features-they’re infrastructure for reusable and compounding organizational knowledge. Treat them like code: versioned, documented, reviewed, maintained. The returns in cost, output quality, and velocity will become a core competitive advantage in the agentic AI era.