Agent skill
skill-evolution
Self-evolving skill system. Skills are scored after execution (0-100) on 5 dimensions. Score 90+ over 5 runs = crystallized (locked). Score below 30 = auto-repair attempted. Skills improve themselves through usage feedback.
Install this agent skill to your Project
npx add-skill https://github.com/vibeeval/vibecosystem/tree/main/skills/skill-evolution
SKILL.md
Skill Evolution
Darwinian selection for skills. Skills that produce good outcomes are crystallized and protected. Skills that produce poor outcomes are repaired or archived. Every execution generates a score that drives the next generation of the skill.
The 5 Scoring Dimensions
Each skill execution is scored 0-100 on five dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Accuracy | 25% | Did the skill produce the correct result for the task? |
| Relevance | 20% | Was the skill content applicable to the actual use case? |
| Token Efficiency | 20% | Did the skill guide the agent without bloat or repetition? |
| User Satisfaction | 20% | Did the outcome meet or exceed user expectations? |
| Reusability | 15% | Could another agent use this skill in a similar situation? |
Composite score = weighted average of all five dimensions (0-100).
Scoring Rubric
90-100: Excellent -- candidate for crystallization
70-89: Good -- active skill, no action needed
50-69: Adequate -- flag for review after 3 more runs
30-49: Poor -- schedule auto-repair attempt
0-29: Critical -- immediate auto-repair or archive
Skill Lifecycle
DRAFT ACTIVE CRYSTALLIZED ARCHIVED
| | | |
New skill In regular use Proven stable Deprecated/replaced
| | | |
+-- first run ->+-- score >90 ++-- score <30 |
| for 5+ runs | (3 attempts) |
+-- score <30 -->+ auto-repair |
| auto-repair | fails 3x -->--+
+-- score >90 -->+
Draft
New skills enter as Draft. They receive no special protection and are evaluated critically on first use. A Draft skill that scores below 30 on its very first run is discarded rather than repaired.
Active
Skills in regular use. Scores are tracked in ~/.claude/skill-scores.jsonl. No action unless scores trend below 30 or above 90 over a rolling window of 5 runs.
Crystallized
A skill that maintains an average composite score above 90 over 5 or more consecutive runs is crystallized:
- Git tag applied:
skill/<name>/crystallized-v<N> - Read-only flag added to frontmatter:
locked: true - Skill is excluded from auto-repair
- Changes require explicit human unlock + PR
Archived
A skill that fails auto-repair 3 times is archived:
- Moved to
skills/_archived/<name>/ - Git tag applied:
skill/<name>/archived - Replacement skill drafted by
catalystagent if the capability is still needed
Score Storage Format
Append one record per execution to ~/.claude/skill-scores.jsonl:
{"skill":"experiment-loop","ts":"2026-04-07T10:00:00Z","session":"abc123","scores":{"accuracy":88,"relevance":92,"token_efficiency":75,"user_satisfaction":90,"reusability":85},"composite":86.5,"feedback":"Loop ran 4 iterations successfully, target nearly met"}
{"skill":"experiment-loop","ts":"2026-04-07T14:30:00Z","session":"def456","scores":{"accuracy":95,"relevance":90,"token_efficiency":82,"user_satisfaction":95,"reusability":88},"composite":90.4,"feedback":"Bundle size reduced 28%, target exceeded"}
Score CLI (quick check)
# Average scores for a skill (last 10 runs)
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, statistics
skill = '$1'
runs = [json.loads(l) for l in sys.stdin if json.loads(l).get('skill') == skill][-10:]
if runs:
avg = statistics.mean(r['composite'] for r in runs)
print(f'{skill}: {avg:.1f} avg over {len(runs)} runs')
"
Crystallization Protocol
When a skill reaches 90+ composite score over 5+ consecutive runs:
- Verify scores in
~/.claude/skill-scores.jsonl-- confirm no outliers inflating the average - Add
locked: trueto the skill's frontmatter - Apply git tag:
bash
git tag skill/<name>/crystallized-v1 -m "Crystallized: avg score 92.3 over 7 runs" git push origin skill/<name>/crystallized-v1 - Log the crystallization in
thoughts/SKILL-EVOLUTION.md - Notify via canavar cross-training so all agents know this skill is stable
Auto-Repair Protocol
When a skill's composite score drops below 30:
Diagnosis
- Identify the lowest-scoring dimension (the primary failure mode)
- Read the last 3 session feedback notes from
~/.claude/skill-scores.jsonl - Summarize what went wrong (specific, not vague)
Repair
The catalyst agent rewrites the failing section(s) of the skill:
- Only the sections relevant to the low-scoring dimension
- Preserve all high-scoring sections unchanged
- Add a concrete example for the repaired section
Validation
After repair, the skill is re-scored on a synthetic test case by the verifier agent:
- Synthetic score must be 50+ to proceed to Active state
- If synthetic score < 50, attempt 2 of 3 repairs begins
Escalation
After 3 failed auto-repairs:
- Archive the skill
- Alert via
thoughts/SKILL-EVOLUTION.md - Spawn
catalystto draft a replacement from scratch
Evolution Log Format
Append events to thoughts/SKILL-EVOLUTION.md:
## 2026-04-07
### skill: experiment-loop
- Status change: Active -> Crystallized
- Trigger: avg composite 91.2 over 6 consecutive runs
- Git tag: skill/experiment-loop/crystallized-v1
- Notable strength: Token Efficiency dimension consistently 85+
### skill: legacy-deploy-helper
- Status change: Active -> Auto-Repair (attempt 1/3)
- Trigger: composite 24 on last run
- Lowest dimension: Relevance (12) -- skill referenced outdated Heroku patterns
- Repair: catalyst rewrote "Deployment Targets" section with Vercel/Railway focus
- Post-repair synthetic score: 71 -- promoted back to Active
Integration with Canavar Cross-Training
Skill evolution data feeds into canavar's cross-training pipeline:
- A crystallized skill is injected into canavar's
skill-matrix.jsonwithtrust: locked - An archived skill is marked
trust: deprecated-- agents stop referencing it - Auto-repair failures are logged to
error-ledger.jsonlwithsource: skill-evolution - The canavar leaderboard tracks which agents most frequently produce high-scoring skill executions
# View crystallized skills
node ~/.claude/hooks/dist/canavar-cli.mjs leaderboard --filter crystallized
# View skills needing repair
cat ~/.claude/skill-scores.jsonl | python3 -c "
import sys, json, collections
runs = [json.loads(l) for l in sys.stdin]
low = {r['skill'] for r in runs if r['composite'] < 30}
print('Skills needing repair:', low)
"
Activation
This skill activates automatically when:
- A skill completes an execution (PostToolUse hook)
- A skill is referenced in a session that ends with user dissatisfaction
- The
verifieragent reports a skill-guided task as failed
Agents involved: catalyst (repair), verifier (validation), self-learner (feedback extraction), canavar (cross-training propagation).
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
python-testing
Python testing strategies using pytest, TDD methodology, fixtures, mocking, parametrization, and coverage requirements.
golang-patterns
Idiomatic Go patterns, best practices, and conventions for building robust, efficient, and maintainable Go applications.
tdd-migration-pipeline
Orchestrator-only workflow for migrating/rewriting codebases with full TDD and agent delegation
hizir
Hızır'ın kullanım kılavuzu. Tüm komutlar, agent'lar, workflow'lar, sistemler burada. /hizir yaz, her şeyi gör.
secret-patterns
30+ service-specific secret detection regex patterns, entropy-based detection, PEM/JWT/Base64 identification, and false positive filtering.
agentica-prompts
Write reliable prompts for Agentica/REPL agents that avoid LLM instruction ambiguity
Didn't find tool you were looking for?