Understanding Anthropic Agent Skills: From Tool-Calling to Expertise Modules

1. The Problem: Why Agents Can't Just Use More Tools

Before explaining what Skills are, let's understand the problem they solve.

The Traditional Agent Architecture

When you build an AI agent today (with Claude, GPT-4, or similar), you typically give it:

  1. A system prompt - Instructions on how to behave
  2. Tools - Functions it can call (search the web, query a database, run code)
  3. Context - Previous conversation history and relevant documents

Here's what that looks like in practice:

# Traditional approach
system_prompt = """
You are a financial analyst assistant. When analyzing stocks:
1. Always check the latest price using get_stock_price()
2. Calculate the PE ratio using company financials
3. Compare to industry average
4. Format your response as a structured report
...
"""

tools = [
    {"name": "get_stock_price", "description": "Get current stock price", ...},
    {"name": "get_financials", "description": "Get company financials", ...},
    {"name": "calculate_ratio", "description": "Calculate financial ratios", ...},
    # ... 50 more tools
]

agent.run(system_prompt, tools, user_query)

The Scaling Problem

This works fine with 5-10 tools. But what happens when you need 100 tools? Or 500?

Problem 1: Context Bloat
Every tool definition must be sent to the model on every request. If each tool takes 100 tokens to describe, 500 tools = 50,000 tokens of context used before the user even says anything.

Problem 2: Decision Paralysis
When an agent sees 500 possible tools, it struggles to pick the right ones. It's like asking someone to choose dinner from a menu with 500 items—they'll either freeze or pick randomly.

Problem 3: Procedural Knowledge
Tools tell the agent what it can do, but not how to do it well. You can give an agent a create_excel_file() function, but that doesn't teach it:

  • How to structure a financial report
  • What formatting standards to follow
  • In what order to perform calculations
  • How to handle edge cases

You end up encoding all this in the system prompt, which makes it even longer.

The Repetition Problem

Every time a user needs a financial analysis:

  • You send the same 2,000-token instructions
  • You send the same 100 tool definitions
  • You hope the agent remembers all the procedural steps
  • You pay for those tokens every single time

And if you need a different specialized task (legal document review, code analysis, SEO auditing), you need to either:

  • Swap out the entire system prompt (breaking context)
  • Cram multiple specialties into one massive prompt (making it unreliable)
  • Build separate agents for each specialty (duplicating infrastructure)

This is the problem Skills solve.

2. What Agent Skills Actually Are

Here's the simplest possible definition:

Agent Skills are self-contained folders that teach an AI agent how to perform a specific type of work, and the agent only loads them when needed.

Think of Skills as:

  • Not tools - They don't give new capabilities; they teach how to use existing capabilities
  • Not prompts - They're modular, reusable, and loaded on-demand
  • Not RAG documents - They contain procedural instructions, not just knowledge
  • Not fine-tuning - They modify behavior without changing model weights

The closest analogy: Skills are Standard Operating Procedures (SOPs) for AI agents.

The Key Insight: Progressive Disclosure

The most important technical feature of Skills is progressive disclosure:

  1. Initially: The agent only sees skill metadata (name + brief description)
  2. When needed: The agent loads the full skill instructions
  3. After use: The skill is no longer in active context

This means:

  • An agent can have access to 1,000 skills
  • But only "opens the manual" for 2-3 at a time
  • Keeping context usage minimal
graph TD
    A[User Query: Analyze this spreadsheet] --> B{Agent scans skill metadata}
    B --> C[Finds: excel-analysis skill]
    C --> D[Loads full SKILL.md instructions]
    D --> E[Executes analysis following procedures]
    E --> F[Returns result]
    F --> G[Unloads skill from active context]
    
    style B fill:#e1f5ff
    style D fill:#fff4e1
    style G fill:#f0f0f0

3. Anatomy of a Skill

A Skill is just a folder with a specific structure. Let's break down what's inside.

The Folder Structure

financial-analysis-skill/
├── SKILL.md               # The core instruction file
├── report_template.xlsx   # Supporting resources
├── calculation_script.py  # Optional executable code
└── examples/
    └── sample_analysis.md

The SKILL.md File

This is the heart of a Skill. It has two parts:

1. YAML Frontmatter (Metadata)

---
name: financial-analysis
description: Analyze company financials, calculate ratios, and generate investment reports following CFA Institute standards
version: 1.2.0
author: Finance Team
tags: [finance, analysis, reporting]
---

2. Markdown Instructions (The "Manual")

# Financial Analysis Skill

## When to Use This Skill

Use this skill when the user asks to:
- Analyze a company's financial health
- Calculate financial ratios (PE, PB, ROE, debt ratios)
- Generate an investment report
- Compare companies financially

## Prerequisites

Before running analysis, ensure you have:
- Company name or ticker symbol
- Access to financial data (balance sheet, income statement, cash flow)

## Step-by-Step Procedure

### Step 1: Gather Financial Data

Use the `get_financial_data()` tool to retrieve:
- Latest annual report data
- Last 5 years of key metrics
- Industry peer data for comparison

### Step 2: Calculate Key Ratios

Calculate these ratios in order:
1. **Profitability**: ROE, ROA, Profit Margin
2. **Liquidity**: Current Ratio, Quick Ratio
3. **Leverage**: Debt-to-Equity, Interest Coverage
4. **Valuation**: PE Ratio, PB Ratio, PEG Ratio

Use `calculation_script.py` for complex calculations.

### Step 3: Comparative Analysis

Compare each ratio to:
- Industry average (from peer data)
- Company's historical performance (5-year trend)
- Benchmark indices (S&P 500 if US company)

### Step 4: Generate Report

Use `report_template.xlsx` to structure the output:
- Executive Summary (1 paragraph)
- Key Metrics Table
- Ratio Analysis Section
- Peer Comparison Chart
- Investment Recommendation (Buy/Hold/Sell with rationale)

## Formatting Standards

- Use 2 decimal places for percentages
- Currency in millions (M) or billions (B)
- Always include data source and date
- Cite assumptions explicitly

## Error Handling

If financial data is incomplete:
- State which metrics are missing
- Use industry averages as proxies (mark as estimated)
- Adjust confidence level in recommendation

## Example Output

See `examples/sample_analysis.md` for reference format.

Supporting Files

calculation_script.py

def calculate_roe(net_income, shareholders_equity):
    """Calculate Return on Equity"""
    if shareholders_equity == 0:
        return None
    return (net_income / shareholders_equity) * 100

def calculate_debt_to_equity(total_debt, total_equity):
    """Calculate Debt-to-Equity Ratio"""
    if total_equity == 0:
        return None
    return total_debt / total_equity

# ... more calculation functions

The agent can execute this script when performing analysis.

4. How Skills Work in Practice: A Complete Example

Let's trace exactly what happens when an agent uses a Skill.

Scenario: User Asks for Financial Analysis

User: "Can you analyze Tesla's financials and tell me if it's a good buy?"

Step 1: Initial State

The agent starts with:

  • System prompt: "You are a helpful assistant..."
  • Available tools: execute_code, read_file, search_web, etc.
  • Available skills (metadata only):
    [
      {"name": "financial-analysis", "description": "Analyze company financials..."},
      {"name": "legal-review", "description": "Review legal documents..."},
      {"name": "code-audit", "description": "Audit code for security..."},
      // ... 100 more skill metadata entries
    ]
    

Context usage so far: ~500 tokens (system prompt + tool defs + skill metadata)

Step 2: Skill Selection

The agent reasons:

"The user wants financial analysis. I see a skill called 'financial-analysis' that matches this task. I should load it."

It uses the read_file tool to read financial-analysis-skill/SKILL.md.

Context usage now: ~500 + 1,200 tokens (loaded the skill instructions)

Step 3: Skill Execution

Following the skill's instructions:

  1. Agent uses search_web tool to get Tesla's latest financial data
  2. Loads calculation_script.py and executes ratio calculations
  3. Uses read_file to access report_template.xlsx
  4. Generates analysis following the exact format specified

Step 4: Response

# Tesla Inc. (TSLA) Financial Analysis
*Data as of Q4 2025*

## Executive Summary
Tesla shows strong revenue growth (+15% YoY) but declining margins (18.2% vs 21.1% prior year). 
Liquidity is solid with current ratio of 1.4x, but increasing debt levels warrant monitoring.

## Key Metrics
| Metric | Value | Industry Avg | Status |
|--------|-------|--------------|--------|
| ROE | 24.3% | 18.5% | 🟢 Above |
| Debt/Equity | 0.65 | 0.42 | 🟡 Elevated |
| PE Ratio | 58.2x | 24.3x | 🔴 High |
...

## Recommendation: HOLD
*Reasoning: Strong operational performance offset by valuation concerns...*

Context usage: ~2,000 tokens total during execution. After response, the skill is unloaded.

What Just Happened?

  • The agent had access to 100+ skills but only loaded 1
  • It followed a structured, repeatable procedure
  • It used the skill's templates and scripts
  • The output matched professional standards

Compare this to the traditional approach:

Without Skills, you'd need a 3,000+ token system prompt containing:

  • All financial analysis steps
  • All formatting rules
  • All calculation formulas
  • All error handling procedures

And you'd send this every time, for every analysis.

5. Skills vs Everything Else You've Used

Skills vs Tools: Tools give capabilities (functions to call), Skills give expertise (how to use those functions properly). Tools are like a hammer, Skills are like carpentry training. All tools load upfront; Skills load on-demand.

Skills vs Prompts: Prompts are sent every message (expensive, inflexible). Skills are loaded only when needed (efficient, modular). Prompts are repeated instructions; Skills are reference manuals.

Skills vs RAG: RAG retrieves knowledge documents to answer "What is X?" Skills provide procedural instructions to answer "How do I do X?" RAG helps agents know things; Skills help agents do things.

Skills vs Fine-Tuning: Fine-tuning rewires the model (slow, expensive, permanent). Skills are external instructions (instant, free, reversible). Use fine-tuning for personality changes; use Skills for teaching workflows.

6. The Progressive Disclosure Mechanism: How It Really Works

This is the technical heart of Skills. Let's understand it precisely.

The Three-Layer Loading Strategy

Layer 1: Skill Index (Always Loaded)
~10-50 tokens per skill, just metadata:

{
  "name": "financial-analysis",
  "description": "Analyze company financials and generate investment reports"
}

If you have 200 skills, that's ~2,000-10,000 tokens. Loaded once at the start.

Layer 2: Skill Instructions (Loaded On-Demand)
~1,000-5,000 tokens for full SKILL.md:

  • When to use this skill
  • Step-by-step procedures
  • Templates and formatting rules
  • Error handling

Loaded when the agent decides it needs this skill.

Layer 3: Supporting Resources (Loaded As-Needed)
Variable size:

  • Scripts (100-1,000 tokens)
  • Templates (500-2,000 tokens)
  • Examples (1,000-5,000 tokens)

Loaded during execution, often just-in-time.

Why This Matters: The Math

Traditional approach (all in system prompt):

System prompt: 10,000 tokens (includes all procedures)
Every message: 10,000 tokens used
100 messages: 1,000,000 tokens

Skills approach:

Initial load: 2,000 tokens (skill index)
Skill load when needed: +2,000 tokens (1 skill)
Total per message: ~4,000 tokens
100 messages: 400,000 tokens (60% savings)

The Selection Process

How does the agent decide which skill to load?

  1. Query analysis: Agent reads user request
  2. Metadata scan: Agent reviews skill names and descriptions
  3. Relevance ranking: Agent identifies potentially useful skills
  4. Explicit loading: Agent uses read_file tool to load SKILL.md
  5. Execution: Agent follows loaded instructions
sequenceDiagram
    participant User
    participant Agent
    participant Skills Index
    participant Skill File
    
    User->>Agent: "Analyze this company financially"
    Agent->>Skills Index: Scan metadata for relevant skills
    Skills Index-->>Agent: Returns ["financial-analysis", "market-research"]
    Agent->>Skill File: read_file("financial-analysis/SKILL.md")
    Skill File-->>Agent: Returns full instructions
    Agent->>Agent: Execute procedures from skill
    Agent->>User: Returns formatted analysis

Why This Is Different from RAG

People often confuse Skills with RAG because both involve "loading information on-demand." Here's the key distinction:

RAG:

  • Semantic search: "Find documents similar to this query"
  • Chunks: Returns snippets from knowledge base
  • Purpose: Provide factual information

Skills:

  • Explicit selection: "Load the financial-analysis skill"
  • Whole files: Returns complete procedural instructions
  • Purpose: Provide structured workflow

7. Creating Your Own Skills

Let's walk through building a Skill from scratch.

Example: SEO Audit Skill

Step 1: Create the folder structure

mkdir seo-audit-skill
cd seo-audit-skill
touch SKILL.md

Step 2: Write the SKILL.md file

---
name: seo-audit
description: Audit web pages for SEO issues and provide recommendations
version: 1.0.0
---

# SEO Audit Skill

## When to Use This Skill
Use when the user asks to audit a webpage for SEO or identify search ranking issues.

## Step-by-Step Procedure

### Step 1: Fetch Page Content
- HTML source code
- Page metadata (title, description)
- Links and images

### Step 2: Check Technical SEO
- Title tag: 50-60 characters, contains keyword
- Meta description: 150-160 characters
- H1 tag: Exactly one, contains main keyword
- URL structure: Clean, readable, uses hyphens

### Step 3: Analyze Content
- Keyword usage (avoid stuffing, <2% density)
- Content depth (1,500+ words for competitive topics)
- Readability (short paragraphs, bullet points)

### Step 4: Check User Experience
- Mobile-friendly
- Fast load time (<3 seconds)
- Clear call-to-action

## Output Format

# SEO Audit Report
*Page: [URL] | Date: [Date]*

## Overall Score: [X/100]

### Critical Issues
- [Issue with impact]

### Recommendations
- [Actionable recommendation]

### What's Working Well
- [Positive aspects]

Step 3: Test the skill

agent = AnthropicAgent()
agent.add_skill("./seo-audit-skill")
response = agent.run("Audit https://example.com for SEO")

8. When to Use Skills (and When Not To)

Use Skills When:

You have repeatable procedures - Financial analysis, code reviews, document formatting
You need consistency - Every audit should check the same things
You're hitting context limits - System prompts growing too large
You need multiple specialties - Finance, legal, engineering, marketing tasks

Don't Use Skills When:

The task is simple and one-off - "Translate this text" doesn't need a skill
You need real-time dynamic behavior - Use tools for dynamic data
The procedure changes constantly - Skills add overhead for rapid changes

Skills vs Alternatives: Quick Guide

Your Need Best Solution
Agent needs to call external APIs Tools
Agent needs to follow a complex workflow Skills
Agent needs to answer questions with company docs RAG
Agent needs a different personality/writing style Fine-tuning
Agent needs to switch between multiple expert modes Skills

9. Common Misconceptions Clarified

Misconception 1: "Skills are just better prompts"

Reality: Prompts are sent with every message. Skills are loaded on-demand. This isn't just a technical difference—it's an architectural one that enables scaling to hundreds of specialties.

Misconception 2: "Skills replace tools"

Reality: Skills use tools. They teach the agent when and how to use tools effectively. They're complementary, not competing.

Misconception 3: "Skills make the agent smarter"

Reality: Skills don't change the model's capabilities. They provide structured procedures that make the agent more reliable and consistent, not more inherently intelligent.

Misconception 4: "You need to be a programmer to create skills"

Reality: Skills are written in Markdown. If you can write a well-organized procedure document, you can create a skill. The optional code is just that—optional.

Misconception 5: "Skills only work with Anthropic models"

Reality: Anthropic released Skills as an open standard. Any AI system that implements the agent skills protocol can use them.

10. Key Takeaways

  • Skills are procedural memory modules that agents load on-demand
  • Progressive disclosure is the key mechanism: metadata first, full instructions when needed
  • Skills are not prompts, tools, RAG, or fine-tuning—they're a distinct pattern
  • The folder structure is simple: Just a SKILL.md file with optional supporting resources
  • Skills work best for repeatable procedures that need consistency
  • They scale because you can have 1,000 skills but only load 2-3 at a time
  • The open standard means skills are portable across AI platforms
  • The main benefit is architectural: maintainable, composable, shareable expertise

References