Agent skill
plugin-dev-workflow
Guide plugin development workflow — editing skills, agents, hooks, or eval framework in this repo. Use when modifying files in plugins/elixir-phoenix/, lab/eval/, or lab/autoresearch/. Ensures changes pass eval, lint, and tests before committing.
Install this agent skill to your Project
npx add-skill https://github.com/oliver-kriska/claude-elixir-phoenix/tree/main/.claude/skills/plugin-dev-workflow
SKILL.md
Plugin Development Workflow
This repo is the Elixir/Phoenix Claude Code plugin. When editing plugin files, follow this workflow to ensure quality.
Before You Start
Run make help to see all available commands:
make eval # Quick: lint + score changed skills/agents
make eval-all # Full: all 40 skills + 20 agents
make eval-fix # Auto-fix + show failures
make test # 52 pytest tests for eval framework
make ci # Full CI pipeline
Scoring Individual Files (CLI)
IMPORTANT: Always use -m module syntax, never run scorer.py directly.
# Score ONE skill (use -m, NOT direct file path)
python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md
# Score ONE skill with pretty output
python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md --pretty
# Score all skills
python3 -m lab.eval.scorer --all
# Score ONE agent
python3 -m lab.eval.agent_scorer plugins/elixir-phoenix/agents/verification-runner.md
# Score all agents
python3 -m lab.eval.agent_scorer --all
make ci # Full CI pipeline
When Editing Skills (plugins/elixir-phoenix/skills/*/SKILL.md)
- Read CLAUDE.md conventions (size limits, frontmatter requirements)
- Make your changes
- Run
make eval— it auto-detects changed skills and scores them - If FAIL: check the dimension that failed, fix it
- Run
make lintto verify markdown formatting - Commit
Skill requirements (eval checks all of these):
- Frontmatter: name, description, effort. Description must start with action verb + include "Use when..."
- Iron Laws section with 1+ numbered items
- Under 185 lines (command skills) or 150 lines (reference skills)
- No section exceeds 45 lines
- All
/phx:references point to existing skills - All
references/*.mdpaths exist - No dangerous code patterns outside Iron Laws sections
- Code examples present (1+ fenced code blocks)
- "Use when..." in description (for trigger accuracy)
When Editing Agents (plugins/elixir-phoenix/agents/*.md)
- Make your changes
- Run
make eval-agentsto score all agents - Agent requirements:
permissionMode: bypassPermissions(always — background agents need it)disallowedTools: Write, Edit, NotebookEditfor review/analysis agents- model matches effort: haiku=low, sonnet=medium, opus=high
- Under 300 lines (specialist) or 535 lines (orchestrator)
When Editing Eval Framework (lab/eval/*.py)
- Make your changes
- Run
make test— 52 pytest tests must pass - Run
make eval-all— verify no skills/agents regressed - If adding new matchers: add tests in
lab/eval/tests/test_matchers.py
When Editing Hooks (plugins/elixir-phoenix/hooks/scripts/*.sh)
- Make your changes
- Run
make lint(markdown in hook comments) - Test the hook manually (hooks run on Edit/Write/Bash events)
- Check CLAUDE.md hook documentation is still accurate
Autoresearch (Self-Improvement Loop)
If make eval-fix shows failures, it suggests an autoresearch command:
# Copy-paste the suggested command from eval-fix output
claude -p 'Run autoresearch. Score all skills...' --allowedTools 'Edit,Read,Write,Bash,Glob,Grep'
This runs the autoresearch loop: find weakest skill → fix ONE issue → re-score → keep/revert.
Pre-Commit Checklist
Before committing any plugin changes:
-
make lintpasses -
make evalpasses (changed files) -
make testpasses (if eval framework changed) - CHANGELOG.md updated (if user-visible change)
- Version bumped in plugin.json (if releasing)
References
- CLAUDE.md — full conventions, size limits, checklist
lab/eval/— scoring framework (24 matchers, 8 dimensions)lab/autoresearch/— self-improvement looplab/findings/interesting.jsonl— log interesting discoveries here
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
lab:autoresearch
Self-improving loop for plugin skills. Reads program.md, proposes one mutation per iteration, evaluates against deterministic scorer, keeps improvements via git, reverts failures. Targets weakest skill+dimension. Use with /loop for overnight runs.
promote
Generate X/Twitter release promotion posts with ASCII tables and CodeSnap rendering. Use when writing release posts, promotion tweets, plugin announcements, or preparing social media content for new versions.
skill-monitor
Analyze skill effectiveness across sessions. Computes per-skill metrics (action rate, friction, outcomes), identifies degrading skills, and generates improvement recommendations. Requires session-scan data in metrics.jsonl.
session-trends
Analyze trends across session metrics. Computes windowed aggregates, deltas, and compares against MEMORY.md findings. Use periodically for progress tracking.
cc-changelog
CONTRIBUTOR TOOL - Track CC changelog, extract new versions since last check, analyze impact on plugin (breaking changes, opportunities, deprecations). Run periodically or before releases. NOT part of the distributed plugin.
session-scan
Compute metrics for Claude Code sessions. Discovers via ccrider, filters trivial, computes friction/opportunity/fingerprint scores. Use for broad session triage.
Didn't find tool you were looking for?