Understanding Anthropic Agent Skills: From Tool-Calling to Expertise Modules
1. The Problem: Why Agents Can't Just Use More Tools
Before explaining what Skills are, let's understand the problem they solve.
The Traditional Agent Architecture
When you build an AI agent today (with Claude, GPT-4, or similar), you typically give it:
- A system prompt - Instructions on how to behave
- Tools - Functions it can call (search the web, query a database, run code)
- Context - Previous conversation history and relevant documents
Here's what that looks like in practice:
# Traditional approach
system_prompt = """
You are a financial analyst assistant. When analyzing stocks:
1. Always check the latest price using get_stock_price()
2. Calculate the PE ratio using company financials
3. Compare to industry average
4. Format your response as a structured report
...
"""
tools = [
{"name": "get_stock_price", "description": "Get current stock price", ...},
{"name": "get_financials", "description": "Get company financials", ...},
{"name": "calculate_ratio", "description": "Calculate financial ratios", ...},
# ... 50 more tools
]
agent.run(system_prompt, tools, user_query)
The Scaling Problem
This works fine with 5-10 tools. But what happens when you need 100 tools? Or 500?
Problem 1: Context Bloat
Every tool definition must be sent to the model on every request. If each tool takes 100 tokens to describe, 500 tools = 50,000 tokens of context used before the user even says anything.
Problem 2: Decision Paralysis
When an agent sees 500 possible tools, it struggles to pick the right ones. It's like asking someone to choose dinner from a menu with 500 items—they'll either freeze or pick randomly.
Problem 3: Procedural Knowledge
Tools tell the agent what it can do, but not how to do it well. You can give an agent a create_excel_file() function, but that doesn't teach it:
- How to structure a financial report
- What formatting standards to follow
- In what order to perform calculations
- How to handle edge cases
You end up encoding all this in the system prompt, which makes it even longer.
The Repetition Problem
Every time a user needs a financial analysis:
- You send the same 2,000-token instructions
- You send the same 100 tool definitions
- You hope the agent remembers all the procedural steps
- You pay for those tokens every single time
And if you need a different specialized task (legal document review, code analysis, SEO auditing), you need to either:
- Swap out the entire system prompt (breaking context)
- Cram multiple specialties into one massive prompt (making it unreliable)
- Build separate agents for each specialty (duplicating infrastructure)
This is the problem Skills solve.
2. What Agent Skills Actually Are
Here's the simplest possible definition:
Agent Skills are self-contained folders that teach an AI agent how to perform a specific type of work, and the agent only loads them when needed.
Think of Skills as:
- Not tools - They don't give new capabilities; they teach how to use existing capabilities
- Not prompts - They're modular, reusable, and loaded on-demand
- Not RAG documents - They contain procedural instructions, not just knowledge
- Not fine-tuning - They modify behavior without changing model weights
The closest analogy: Skills are Standard Operating Procedures (SOPs) for AI agents.
The Key Insight: Progressive Disclosure
The most important technical feature of Skills is progressive disclosure:
- Initially: The agent only sees skill metadata (name + brief description)
- When needed: The agent loads the full skill instructions
- After use: The skill is no longer in active context
This means:
- An agent can have access to 1,000 skills
- But only "opens the manual" for 2-3 at a time
- Keeping context usage minimal
graph TD
A[User Query: Analyze this spreadsheet] --> B{Agent scans skill metadata}
B --> C[Finds: excel-analysis skill]
C --> D[Loads full SKILL.md instructions]
D --> E[Executes analysis following procedures]
E --> F[Returns result]
F --> G[Unloads skill from active context]
style B fill:#e1f5ff
style D fill:#fff4e1
style G fill:#f0f0f0
3. Anatomy of a Skill
A Skill is just a folder with a specific structure. Let's break down what's inside.
The Folder Structure
financial-analysis-skill/
├── SKILL.md # The core instruction file
├── report_template.xlsx # Supporting resources
├── calculation_script.py # Optional executable code
└── examples/
└── sample_analysis.md
The SKILL.md File
This is the heart of a Skill. It has two parts:
1. YAML Frontmatter (Metadata)
---
name: financial-analysis
description: Analyze company financials, calculate ratios, and generate investment reports following CFA Institute standards
version: 1.2.0
author: Finance Team
tags: [finance, analysis, reporting]
---
2. Markdown Instructions (The "Manual")
# Financial Analysis Skill
## When to Use This Skill
Use this skill when the user asks to:
- Analyze a company's financial health
- Calculate financial ratios (PE, PB, ROE, debt ratios)
- Generate an investment report
- Compare companies financially
## Prerequisites
Before running analysis, ensure you have:
- Company name or ticker symbol
- Access to financial data (balance sheet, income statement, cash flow)
## Step-by-Step Procedure
### Step 1: Gather Financial Data
Use the `get_financial_data()` tool to retrieve:
- Latest annual report data
- Last 5 years of key metrics
- Industry peer data for comparison
### Step 2: Calculate Key Ratios
Calculate these ratios in order:
1. **Profitability**: ROE, ROA, Profit Margin
2. **Liquidity**: Current Ratio, Quick Ratio
3. **Leverage**: Debt-to-Equity, Interest Coverage
4. **Valuation**: PE Ratio, PB Ratio, PEG Ratio
Use `calculation_script.py` for complex calculations.
### Step 3: Comparative Analysis
Compare each ratio to:
- Industry average (from peer data)
- Company's historical performance (5-year trend)
- Benchmark indices (S&P 500 if US company)
### Step 4: Generate Report
Use `report_template.xlsx` to structure the output:
- Executive Summary (1 paragraph)
- Key Metrics Table
- Ratio Analysis Section
- Peer Comparison Chart
- Investment Recommendation (Buy/Hold/Sell with rationale)
## Formatting Standards
- Use 2 decimal places for percentages
- Currency in millions (M) or billions (B)
- Always include data source and date
- Cite assumptions explicitly
## Error Handling
If financial data is incomplete:
- State which metrics are missing
- Use industry averages as proxies (mark as estimated)
- Adjust confidence level in recommendation
## Example Output
See `examples/sample_analysis.md` for reference format.
Supporting Files
calculation_script.py
def calculate_roe(net_income, shareholders_equity):
"""Calculate Return on Equity"""
if shareholders_equity == 0:
return None
return (net_income / shareholders_equity) * 100
def calculate_debt_to_equity(total_debt, total_equity):
"""Calculate Debt-to-Equity Ratio"""
if total_equity == 0:
return None
return total_debt / total_equity
# ... more calculation functions
The agent can execute this script when performing analysis.
4. How Skills Work in Practice: A Complete Example
Let's trace exactly what happens when an agent uses a Skill.
Scenario: User Asks for Financial Analysis
User: "Can you analyze Tesla's financials and tell me if it's a good buy?"
Step 1: Initial State
The agent starts with:
- System prompt: "You are a helpful assistant..."
- Available tools:
execute_code,read_file,search_web, etc. - Available skills (metadata only):
[ {"name": "financial-analysis", "description": "Analyze company financials..."}, {"name": "legal-review", "description": "Review legal documents..."}, {"name": "code-audit", "description": "Audit code for security..."}, // ... 100 more skill metadata entries ]
Context usage so far: ~500 tokens (system prompt + tool defs + skill metadata)
Step 2: Skill Selection
The agent reasons:
"The user wants financial analysis. I see a skill called 'financial-analysis' that matches this task. I should load it."
It uses the read_file tool to read financial-analysis-skill/SKILL.md.
Context usage now: ~500 + 1,200 tokens (loaded the skill instructions)
Step 3: Skill Execution
Following the skill's instructions:
- Agent uses
search_webtool to get Tesla's latest financial data - Loads
calculation_script.pyand executes ratio calculations - Uses
read_fileto accessreport_template.xlsx - Generates analysis following the exact format specified
Step 4: Response
# Tesla Inc. (TSLA) Financial Analysis
*Data as of Q4 2025*
## Executive Summary
Tesla shows strong revenue growth (+15% YoY) but declining margins (18.2% vs 21.1% prior year).
Liquidity is solid with current ratio of 1.4x, but increasing debt levels warrant monitoring.
## Key Metrics
| Metric | Value | Industry Avg | Status |
|--------|-------|--------------|--------|
| ROE | 24.3% | 18.5% | 🟢 Above |
| Debt/Equity | 0.65 | 0.42 | 🟡 Elevated |
| PE Ratio | 58.2x | 24.3x | 🔴 High |
...
## Recommendation: HOLD
*Reasoning: Strong operational performance offset by valuation concerns...*
Context usage: ~2,000 tokens total during execution. After response, the skill is unloaded.
What Just Happened?
- The agent had access to 100+ skills but only loaded 1
- It followed a structured, repeatable procedure
- It used the skill's templates and scripts
- The output matched professional standards
Compare this to the traditional approach:
Without Skills, you'd need a 3,000+ token system prompt containing:
- All financial analysis steps
- All formatting rules
- All calculation formulas
- All error handling procedures
And you'd send this every time, for every analysis.
5. Skills vs Everything Else You've Used
Skills vs Tools: Tools give capabilities (functions to call), Skills give expertise (how to use those functions properly). Tools are like a hammer, Skills are like carpentry training. All tools load upfront; Skills load on-demand.
Skills vs Prompts: Prompts are sent every message (expensive, inflexible). Skills are loaded only when needed (efficient, modular). Prompts are repeated instructions; Skills are reference manuals.
Skills vs RAG: RAG retrieves knowledge documents to answer "What is X?" Skills provide procedural instructions to answer "How do I do X?" RAG helps agents know things; Skills help agents do things.
Skills vs Fine-Tuning: Fine-tuning rewires the model (slow, expensive, permanent). Skills are external instructions (instant, free, reversible). Use fine-tuning for personality changes; use Skills for teaching workflows.
6. The Progressive Disclosure Mechanism: How It Really Works
This is the technical heart of Skills. Let's understand it precisely.
The Three-Layer Loading Strategy
Layer 1: Skill Index (Always Loaded)
~10-50 tokens per skill, just metadata:
{
"name": "financial-analysis",
"description": "Analyze company financials and generate investment reports"
}
If you have 200 skills, that's ~2,000-10,000 tokens. Loaded once at the start.
Layer 2: Skill Instructions (Loaded On-Demand)
~1,000-5,000 tokens for full SKILL.md:
- When to use this skill
- Step-by-step procedures
- Templates and formatting rules
- Error handling
Loaded when the agent decides it needs this skill.
Layer 3: Supporting Resources (Loaded As-Needed)
Variable size:
- Scripts (100-1,000 tokens)
- Templates (500-2,000 tokens)
- Examples (1,000-5,000 tokens)
Loaded during execution, often just-in-time.
Why This Matters: The Math
Traditional approach (all in system prompt):
System prompt: 10,000 tokens (includes all procedures)
Every message: 10,000 tokens used
100 messages: 1,000,000 tokens
Skills approach:
Initial load: 2,000 tokens (skill index)
Skill load when needed: +2,000 tokens (1 skill)
Total per message: ~4,000 tokens
100 messages: 400,000 tokens (60% savings)
The Selection Process
How does the agent decide which skill to load?
- Query analysis: Agent reads user request
- Metadata scan: Agent reviews skill names and descriptions
- Relevance ranking: Agent identifies potentially useful skills
- Explicit loading: Agent uses
read_filetool to load SKILL.md - Execution: Agent follows loaded instructions
sequenceDiagram
participant User
participant Agent
participant Skills Index
participant Skill File
User->>Agent: "Analyze this company financially"
Agent->>Skills Index: Scan metadata for relevant skills
Skills Index-->>Agent: Returns ["financial-analysis", "market-research"]
Agent->>Skill File: read_file("financial-analysis/SKILL.md")
Skill File-->>Agent: Returns full instructions
Agent->>Agent: Execute procedures from skill
Agent->>User: Returns formatted analysis
Why This Is Different from RAG
People often confuse Skills with RAG because both involve "loading information on-demand." Here's the key distinction:
RAG:
- Semantic search: "Find documents similar to this query"
- Chunks: Returns snippets from knowledge base
- Purpose: Provide factual information
Skills:
- Explicit selection: "Load the financial-analysis skill"
- Whole files: Returns complete procedural instructions
- Purpose: Provide structured workflow
7. Creating Your Own Skills
Let's walk through building a Skill from scratch.
Example: SEO Audit Skill
Step 1: Create the folder structure
mkdir seo-audit-skill
cd seo-audit-skill
touch SKILL.md
Step 2: Write the SKILL.md file
---
name: seo-audit
description: Audit web pages for SEO issues and provide recommendations
version: 1.0.0
---
# SEO Audit Skill
## When to Use This Skill
Use when the user asks to audit a webpage for SEO or identify search ranking issues.
## Step-by-Step Procedure
### Step 1: Fetch Page Content
- HTML source code
- Page metadata (title, description)
- Links and images
### Step 2: Check Technical SEO
- Title tag: 50-60 characters, contains keyword
- Meta description: 150-160 characters
- H1 tag: Exactly one, contains main keyword
- URL structure: Clean, readable, uses hyphens
### Step 3: Analyze Content
- Keyword usage (avoid stuffing, <2% density)
- Content depth (1,500+ words for competitive topics)
- Readability (short paragraphs, bullet points)
### Step 4: Check User Experience
- Mobile-friendly
- Fast load time (<3 seconds)
- Clear call-to-action
## Output Format
# SEO Audit Report
*Page: [URL] | Date: [Date]*
## Overall Score: [X/100]
### Critical Issues
- [Issue with impact]
### Recommendations
- [Actionable recommendation]
### What's Working Well
- [Positive aspects]
Step 3: Test the skill
agent = AnthropicAgent()
agent.add_skill("./seo-audit-skill")
response = agent.run("Audit https://example.com for SEO")
8. When to Use Skills (and When Not To)
Use Skills When:
You have repeatable procedures - Financial analysis, code reviews, document formatting
You need consistency - Every audit should check the same things
You're hitting context limits - System prompts growing too large
You need multiple specialties - Finance, legal, engineering, marketing tasks
Don't Use Skills When:
The task is simple and one-off - "Translate this text" doesn't need a skill
You need real-time dynamic behavior - Use tools for dynamic data
The procedure changes constantly - Skills add overhead for rapid changes
Skills vs Alternatives: Quick Guide
| Your Need | Best Solution |
|---|---|
| Agent needs to call external APIs | Tools |
| Agent needs to follow a complex workflow | Skills |
| Agent needs to answer questions with company docs | RAG |
| Agent needs a different personality/writing style | Fine-tuning |
| Agent needs to switch between multiple expert modes | Skills |
9. Common Misconceptions Clarified
Misconception 1: "Skills are just better prompts"
Reality: Prompts are sent with every message. Skills are loaded on-demand. This isn't just a technical difference—it's an architectural one that enables scaling to hundreds of specialties.
Misconception 2: "Skills replace tools"
Reality: Skills use tools. They teach the agent when and how to use tools effectively. They're complementary, not competing.
Misconception 3: "Skills make the agent smarter"
Reality: Skills don't change the model's capabilities. They provide structured procedures that make the agent more reliable and consistent, not more inherently intelligent.
Misconception 4: "You need to be a programmer to create skills"
Reality: Skills are written in Markdown. If you can write a well-organized procedure document, you can create a skill. The optional code is just that—optional.
Misconception 5: "Skills only work with Anthropic models"
Reality: Anthropic released Skills as an open standard. Any AI system that implements the agent skills protocol can use them.
10. Key Takeaways
- Skills are procedural memory modules that agents load on-demand
- Progressive disclosure is the key mechanism: metadata first, full instructions when needed
- Skills are not prompts, tools, RAG, or fine-tuning—they're a distinct pattern
- The folder structure is simple: Just a SKILL.md file with optional supporting resources
- Skills work best for repeatable procedures that need consistency
- They scale because you can have 1,000 skills but only load 2-3 at a time
- The open standard means skills are portable across AI platforms
- The main benefit is architectural: maintainable, composable, shareable expertise