Agent skill
analyzing-skill-usage
Use when evaluating skill effectiveness or comparing skill versions. Triggers: 'how are skills performing', 'skill metrics', 'which skills fire correctly', 'skill invocation analysis', 'compare skill versions', 'analyze skill usage'. Also invoked by skill improvement workflows.
Install this agent skill to your Project
npx add-skill https://github.com/axiomantic/spellbook/tree/main/skills/analyzing-skill-usage
SKILL.md
Analyzing Skill Usage
<ROLE>Skill Performance Analyst. You parse session transcripts, extract skill usage events, score each invocation, and produce comparative metrics. Your analysis drives skill improvement decisions. Scores derive from observable events — never speculation.</ROLE>
Before analysis: clarify session scope, skills of interest, and comparison criteria. After analysis: summarize patterns observed, statistical confidence, and actionable findings.
Invariant Principles
- Evidence Over Intuition: Scores derive from observable session events, not speculation
- Context Matters: Correction after skill completion differs from mid-workflow abandonment
- Version Awareness: Track skill variants for A/B comparison when version markers present
- Statistical Humility: Small sample sizes warrant tentative conclusions
Inputs / Outputs
| Input | Required | Description |
|---|---|---|
session_paths |
No | Specific sessions (defaults to recent project sessions) |
skills |
No | Filter to specific skills (defaults to all) |
compare_versions |
No | If true, group by version markers for A/B analysis |
| Output | Description |
|---|---|
skill_report |
Per-skill metrics: invocations, completion rate, correction rate, avg tokens |
weak_skills |
Skills ranked by failure indicators |
version_comparison |
A/B results when versions detected |
Extraction Protocol
1. Load Sessions
from spellbook.sessions.parser import load_jsonl, list_sessions_with_samples
from spellbook.extractors.message_utils import get_tool_calls, get_content, get_role
Sessions at: ~/.claude/projects/<project-encoded>/*.jsonl
2. Detect Skill Invocation Boundaries
Start Event: Tool call where name == "Skill"
for msg in messages:
for call in get_tool_calls(msg):
if call.get("name") == "Skill":
skill_name = call["input"]["skill"]
# Record: skill, timestamp, message index
End Event (first match): another Skill tool call (superseded), session end, or compact boundary (type == "system", subtype == "compact_boundary")
3. Score Each Invocation
Success Signals (+1 each):
- No user correction in skill window
- Skill ran to natural completion (not superseded)
- Artifact produced (Write/Edit tool after skill)
- User continued to new topic
Failure Signals (-1 each):
- User correction detected
- Same skill re-invoked within 5 messages (retry)
- Different skill invoked for apparent same task
- Skill abandoned mid-workflow (superseded without output)
Correction Detection Patterns:
CORRECTION_PATTERNS = [
r"\bno\b(?!t)", # "no" but not "not"
r"\bstop\b",
r"\bwrong\b",
r"\bactually\b",
r"\bdon'?t\b",
r"\binstead\b",
r"\bthat'?s not\b",
]
4. Aggregate Metrics
Per skill, produce:
{
"skill": "develop",
"version": "v1" | None, # If version marker detected
"invocations": 15,
"completions": 12, # Ran to end without supersede
"corrections": 3, # User corrected during
"retries": 1, # Same skill re-invoked
"avg_tokens": 4500, # Tokens in skill window
"completion_rate": 0.80,
"correction_rate": 0.20,
"score": 0.60, # Composite score
}
Analysis Modes
Mode 1: Identify Weak Skills
Rank all skills by composite failure score:
failure_score = (corrections + retries + abandonments) / invocations
Output format:
## Weak Skills Report
| Rank | Skill | Invocations | Failure Rate | Top Failure Mode |
|------|-------|-------------|--------------|------------------|
| 1 | gathering-requirements | 8 | 0.50 | User corrections |
Mode 2: A/B Testing Versions
When version markers detected (e.g., skill:v2 or tagged in args):
## A/B Comparison: develop
| Metric | v1 (n=10) | v2 (n=8) | Delta | Significant |
|--------|-----------|----------|-------|-------------|
| Completion Rate | 0.70 | 0.88 | +0.18 | Yes (p<0.05) |
| Correction Rate | 0.30 | 0.12 | -0.18 | Yes |
| Avg Tokens | 5200 | 4100 | -1100 | Yes |
**Recommendation**: v2 outperforms v1 across all metrics.
Execution Steps
- Enumerate sessions in target scope
- Parse each session, extracting skill events
- Score each invocation using signal detection
- Aggregate by skill (and version if A/B)
- Rank and report based on analysis mode
- Surface actionable insights for skill improvement
Version Detection
Look for version markers: skill name suffix (develop:v2), args containing version ("--version v2", "[v2]"), or session date ranges.
Self-Check
- Sessions loaded and parsed successfully
- Skill invocation boundaries correctly identified
- Correction patterns detected in user messages
- Metrics aggregated per skill (and version if A/B)
- Statistical caveats noted for small samples
- Actionable recommendations provided
<FINAL_EMPHASIS>Skills improve through measurement. Extract events, score honestly, compare rigorously, recommend confidently.</FINAL_EMPHASIS>
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
spellbook-auditing
Meta-audit skill for spellbook development. Spawns parallel subagents to factcheck docs, optimize instructions, find token savings, and identify MCP candidates. Produces actionable report.
documentation-updates
Use after modifying library skills, library commands, or agents to ensure CHANGELOG, README, and docs are updated
project-encyclopedia
[DEPRECATED] Use project-level AGENTS.md files instead. Previously used for first-session codebase onboarding and persistent glossary creation.
reviewing-impl-plans
Use when reviewing implementation plans before execution. Triggers: 'is this plan solid', 'review the plan', 'check before I start building', 'anything missing from this plan', 'will this plan work', 'audit the implementation plan'. NOT for: reviewing design documents (use reviewing-design-docs) or creating plans (use writing-plans).
session-resume
Session resume protocol and session repairs handling. Loaded when spellbook_session_init returns resume_available: true, or when session_init returns a repairs array. Triggers: 'resume', 'continue', 'where were we', session resume, session repairs.
brainstorming
Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.
Didn't find tool you were looking for?