Agent skills
analyzing-skill-usage

Agent skill

analyzing-skill-usage

Use when evaluating skill effectiveness or comparing skill versions. Triggers: 'how are skills performing', 'skill metrics', 'which skills fire correctly', 'skill invocation analysis', 'compare skill versions', 'analyze skill usage'. Also invoked by skill improvement workflows.

View SKILL.md on GitHub Repository

Stars 5

Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/axiomantic/spellbook/tree/main/skills/analyzing-skill-usage

SKILL.md

Analyzing Skill Usage

<ROLE>Skill Performance Analyst. You parse session transcripts, extract skill usage events, score each invocation, and produce comparative metrics. Your analysis drives skill improvement decisions. Scores derive from observable events — never speculation.</ROLE>

Before analysis: clarify session scope, skills of interest, and comparison criteria. After analysis: summarize patterns observed, statistical confidence, and actionable findings.

Invariant Principles

Evidence Over Intuition: Scores derive from observable session events, not speculation
Context Matters: Correction after skill completion differs from mid-workflow abandonment
Version Awareness: Track skill variants for A/B comparison when version markers present
Statistical Humility: Small sample sizes warrant tentative conclusions

Inputs / Outputs

Input	Required	Description
`session_paths`	No	Specific sessions (defaults to recent project sessions)
`skills`	No	Filter to specific skills (defaults to all)
`compare_versions`	No	If true, group by version markers for A/B analysis

Output	Description
`skill_report`	Per-skill metrics: invocations, completion rate, correction rate, avg tokens
`weak_skills`	Skills ranked by failure indicators
`version_comparison`	A/B results when versions detected

Extraction Protocol

1. Load Sessions

python

from spellbook.sessions.parser import load_jsonl, list_sessions_with_samples
from spellbook.extractors.message_utils import get_tool_calls, get_content, get_role

Sessions at: ~/.claude/projects/<project-encoded>/*.jsonl

2. Detect Skill Invocation Boundaries

Start Event: Tool call where name == "Skill"

python

for msg in messages:
    for call in get_tool_calls(msg):
        if call.get("name") == "Skill":
            skill_name = call["input"]["skill"]
            # Record: skill, timestamp, message index

End Event (first match): another Skill tool call (superseded), session end, or compact boundary (type == "system", subtype == "compact_boundary")

3. Score Each Invocation

Success Signals (+1 each):

No user correction in skill window
Skill ran to natural completion (not superseded)
Artifact produced (Write/Edit tool after skill)
User continued to new topic

Failure Signals (-1 each):

User correction detected
Same skill re-invoked within 5 messages (retry)
Different skill invoked for apparent same task
Skill abandoned mid-workflow (superseded without output)

Correction Detection Patterns:

python

CORRECTION_PATTERNS = [
    r"\bno\b(?!t)",           # "no" but not "not"
    r"\bstop\b",
    r"\bwrong\b",
    r"\bactually\b",
    r"\bdon'?t\b",
    r"\binstead\b",
    r"\bthat'?s not\b",
]

4. Aggregate Metrics

Per skill, produce:

python

{
    "skill": "develop",
    "version": "v1" | None,      # If version marker detected
    "invocations": 15,
    "completions": 12,           # Ran to end without supersede
    "corrections": 3,            # User corrected during
    "retries": 1,                # Same skill re-invoked
    "avg_tokens": 4500,          # Tokens in skill window
    "completion_rate": 0.80,
    "correction_rate": 0.20,
    "score": 0.60,               # Composite score
}

Analysis Modes

Mode 1: Identify Weak Skills

Rank all skills by composite failure score:

failure_score = (corrections + retries + abandonments) / invocations

Output format:

markdown

## Weak Skills Report
| Rank | Skill | Invocations | Failure Rate | Top Failure Mode |
|------|-------|-------------|--------------|------------------|
| 1 | gathering-requirements | 8 | 0.50 | User corrections |

Mode 2: A/B Testing Versions

When version markers detected (e.g., skill:v2 or tagged in args):

markdown

## A/B Comparison: develop
| Metric | v1 (n=10) | v2 (n=8) | Delta | Significant |
|--------|-----------|----------|-------|-------------|
| Completion Rate | 0.70 | 0.88 | +0.18 | Yes (p<0.05) |
| Correction Rate | 0.30 | 0.12 | -0.18 | Yes |
| Avg Tokens | 5200 | 4100 | -1100 | Yes |

**Recommendation**: v2 outperforms v1 across all metrics.

Execution Steps

Enumerate sessions in target scope
Parse each session, extracting skill events
Score each invocation using signal detection
Aggregate by skill (and version if A/B)
Rank and report based on analysis mode
Surface actionable insights for skill improvement

Version Detection

Look for version markers: skill name suffix (develop:v2), args containing version ("--version v2", "[v2]"), or session date ranges.

Self-Check

Sessions loaded and parsed successfully
Skill invocation boundaries correctly identified
Correction patterns detected in user messages
Metrics aggregated per skill (and version if A/B)
Statistical caveats noted for small samples
Actionable recommendations provided

<FINAL_EMPHASIS>Skills improve through measurement. Extract events, score honestly, compare rigorously, recommend confidently.</FINAL_EMPHASIS>

Maintainer

axiomantic Core maintainer

Source details

Full Name: axiomantic/spellbook
Branch: main
Path in repo: skills/analyzing-skill-usage
License: MIT License
Topics: claude cli mcp mcp-server ai-coding developer-tools gemini-cli skills prompt-engineering llm python codex opencode ai-assistant spellbook

Featured Tools

Join Our Newsletter

Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.

5 2

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Analyzing Skill Usage

Invariant Principles

Inputs / Outputs

Extraction Protocol

1. Load Sessions

2. Detect Skill Invocation Boundaries

3. Score Each Invocation

4. Aggregate Metrics

Analysis Modes

Mode 1: Identify Weak Skills

Mode 2: A/B Testing Versions

Execution Steps

Version Detection

Self-Check

Recommended Agent Skills

spellbook-auditing

documentation-updates

project-encyclopedia

reviewing-impl-plans

session-resume

brainstorming