Agent skill

verify

Comprehensive verification with parallel test agents. Use when verifying implementations or validating changes.

Stars 143
Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/yonatangross/orchestkit/tree/main/src/skills/verify

Metadata

Additional technical details for this skill

category
workflow-automation
mcp server
memory

SKILL.md

Verify Feature

Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.

Quick Start

bash
/ork:verify authentication flow
/ork:verify --model=opus user profile feature
/ork:verify --scope=backend database migrations

Argument Resolution

python
SCOPE = "$ARGUMENTS"       # Full argument string, e.g., "authentication flow"
SCOPE_TOKEN = "$ARGUMENTS[0]"  # First token for flag detection (e.g., "--scope=backend")
# $ARGUMENTS[0], $ARGUMENTS[1] etc. for indexed access (CC 2.1.59)

# Model override detection (CC 2.1.72)
MODEL_OVERRIDE = None
for token in "$ARGUMENTS".split():
    if token.startswith("--model="):
        MODEL_OVERRIDE = token.split("=", 1)[1]  # "opus", "sonnet", "haiku"
        SCOPE = SCOPE.replace(token, "").strip()

Pass MODEL_OVERRIDE to all Agent() calls via model=MODEL_OVERRIDE when set. Accepts symbolic names (opus, sonnet, haiku) or full IDs (claude-opus-4-6) per CC 2.1.74.

Opus 4.6: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.


STEP 0: Effort-Aware Verification Scaling (CC 2.1.76)

Scale verification depth based on /effort level:

Effort Level Phases Run Agents Output
low Run tests only → pass/fail 0 agents Quick check
medium Tests + code quality + security 3 agents Score + top issues
high (default) All 8 phases + visual capture 6-7 agents Full report + grades

Override: Explicit user selection (e.g., "Full verification") overrides /effort downscaling.

STEP 0a: Verify User Intent with AskUserQuestion

BEFORE creating tasks, clarify verification scope:

python
AskUserQuestion(
  questions=[{
    "question": "What scope for this verification?",
    "header": "Scope",
    "options": [
      {"label": "Full verification (Recommended)", "description": "All tests + security + code quality + visual + grades", "markdown": "```\nFull Verification (10 phases)\n─────────────────────────────\n  7 parallel agents:\n  ┌────────────┐ ┌────────────┐\n  │ Code       │ │ Security   │\n  │ Quality    │ │ Auditor    │\n  ├────────────┤ ├────────────┤\n  │ Test       │ │ Backend    │\n  │ Generator  │ │ Architect  │\n  ├────────────┤ ├────────────┤\n  │ Frontend   │ │ Performance│\n  │ Developer  │ │ Engineer   │\n  ├────────────┤ └────────────┘\n  │ Visual     │\n  │ Capture    │ → gallery.html\n  └────────────┘\n         ▼\n    Composite Score (0-10)\n    8 dimensions + Grade\n    + Visual Gallery\n```"},
      {"label": "Tests only", "description": "Run unit + integration + e2e tests", "markdown": "```\nTests Only\n──────────\n  npm test ──▶ Results\n  ┌─────────────────────┐\n  │ Unit tests     ✓/✗  │\n  │ Integration    ✓/✗  │\n  │ E2E            ✓/✗  │\n  │ Coverage       NN%  │\n  └─────────────────────┘\n  Skip: security, quality, UI\n  Output: Pass/fail + coverage\n```"},
      {"label": "Security audit", "description": "Focus on security vulnerabilities", "markdown": "```\nSecurity Audit\n──────────────\n  security-auditor agent:\n  ┌─────────────────────────┐\n  │ OWASP Top 10       ✓/✗ │\n  │ Dependency CVEs    ✓/✗ │\n  │ Secrets scan       ✓/✗ │\n  │ Auth flow review   ✓/✗ │\n  │ Input validation   ✓/✗ │\n  └─────────────────────────┘\n  Output: Security score 0-10\n          + vulnerability list\n```"},
      {"label": "Code quality", "description": "Lint, types, complexity analysis", "markdown": "```\nCode Quality\n────────────\n  code-quality-reviewer agent:\n  ┌─────────────────────────┐\n  │ Lint errors         N   │\n  │ Type coverage       NN% │\n  │ Cyclomatic complex  N.N │\n  │ Dead code           N   │\n  │ Pattern violations  N   │\n  └─────────────────────────┘\n  Output: Quality score 0-10\n          + refactor suggestions\n```"},
      {"label": "Quick check", "description": "Just run tests, skip detailed analysis", "markdown": "```\nQuick Check (~1 min)\n────────────────────\n  Run tests ──▶ Pass/Fail\n\n  Output:\n  ├── Test results\n  ├── Build status\n  └── Lint status\n  No agents, no grading,\n  no report generation\n```"}
    ],
    "multiSelect": true
  }]
)

Based on answer, adjust workflow:

  • Full verification: All 10 phases (8 + 2.5 + 8.5), 7 parallel agents including visual capture
  • Tests only: Skip phases 2 (security), 5 (UI/UX analysis)
  • Security audit: Focus on security-auditor agent
  • Code quality: Focus on code-quality-reviewer agent
  • Quick check: Run tests only, skip grading and suggestions

STEP 0b: Select Orchestration Mode

Load details: Read("${CLAUDE_SKILL_DIR}/references/orchestration-mode.md") for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.

Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.


MCP Probe + Resume

python
ToolSearch(query="select:mcp__memory__search_nodes")
Write(".claude/chain/capabilities.json", { memory, timestamp })

Read(".claude/chain/state.json")  # resume if exists

Handoff File

After verification completes, write results:

python
Write(".claude/chain/verify-results.json", JSON.stringify({
  "phase": "verify", "skill": "verify",
  "timestamp": now(), "status": "completed",
  "outputs": {
    "tests_passed": N, "tests_failed": N,
    "coverage": "87%", "security_scan": "clean"
  }
}))

Regression Monitor (CC 2.1.71)

Optionally schedule post-verification monitoring:

python
# Guard: Skip cron in headless/CI (CLAUDE_CODE_DISABLE_CRON)
# if env CLAUDE_CODE_DISABLE_CRON is set, run a single check instead
CronCreate(
  schedule="0 8 * * *",
  prompt="Daily regression check: npm test.
    If 7 consecutive passes → CronDelete.
    If failures → alert with details."
)

Task Management (CC 2.1.16)

python
# 1. Create main verification task
TaskCreate(
  subject="Verify [feature-name] implementation",
  description="Comprehensive verification with nuanced grading",
  activeForm="Verifying [feature-name] implementation"
)

# 2. Create subtasks for 8-phase process
TaskCreate(subject="Run code quality checks", activeForm="Running quality checks")    # id=2
TaskCreate(subject="Execute security audit", activeForm="Running security audit")     # id=3
TaskCreate(subject="Verify test coverage", activeForm="Verifying test coverage")      # id=4
TaskCreate(subject="Validate API", activeForm="Validating API")                       # id=5
TaskCreate(subject="Check UI/UX", activeForm="Checking UI/UX")                       # id=6
TaskCreate(subject="Calculate grades", activeForm="Calculating grades")               # id=7
TaskCreate(subject="Generate suggestions", activeForm="Generating suggestions")       # id=8
TaskCreate(subject="Compile report", activeForm="Compiling report")                   # id=9

# 3. Set dependencies — phases 2-6 run in parallel, 7-9 are sequential
TaskUpdate(taskId="7", addBlockedBy=["2", "3", "4", "5", "6"])  # Grading needs all checks
TaskUpdate(taskId="8", addBlockedBy=["7"])  # Suggestions need grades
TaskUpdate(taskId="9", addBlockedBy=["8"])  # Report needs suggestions

# 4. Before starting each task, verify it's unblocked
task = TaskGet(taskId="2")  # Verify blockedBy is empty

# 5. Update status as you progress
TaskUpdate(taskId="2", status="in_progress")  # When starting
TaskUpdate(taskId="2", status="completed")    # When done — repeat for each subtask

8-Phase Workflow

Load details: Read("${CLAUDE_SKILL_DIR}/references/verification-phases.md") for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.

Phase Activities Output
1. Context Gathering Git diff, commit history Changes summary
2. Parallel Agent Dispatch 6 agents evaluate 0-10 scores
2.5 Visual Capture Screenshot routes, AI vision eval Gallery + visual score
3. Test Execution Backend + frontend tests Coverage data
4. Nuanced Grading Composite score calculation Grade (A-F)
5. Improvement Suggestions Effort vs impact analysis Prioritized list
6. Alternative Comparison Compare approaches (optional) Recommendation
7. Metrics Tracking Trend analysis Historical data
8. Report Compilation Evidence artifacts + gallery.html Final report
8.5 Agentation Loop User annotates, ui-feedback fixes Before/after diffs

Phase 2 Agents (Quick Reference)

Agent Focus Output
code-quality-reviewer Lint, types, patterns Quality 0-10
security-auditor OWASP, secrets, CVEs Security 0-10
test-generator Coverage, test quality Coverage 0-10
backend-system-architect API design, async API 0-10
frontend-ui-developer React 19, Zod, a11y UI 0-10
python-performance-engineer Latency, resources, scaling Performance 0-10

Launch ALL agents in ONE message with run_in_background=True and max_turns=25.

Progressive Output (CC 2.1.76+)

Output each agent's score as soon as it completes — don't wait for all 6-7 agents.

Focus mode (CC 2.1.101): In focus mode, include the full composite score, all dimension scores, and the verdict in your final message — the user didn't see the incremental outputs.

Security:     8.2/10 — No critical vulnerabilities found
Code Quality: 7.5/10 — 3 complexity hotspots identified
[...remaining agents still running...]

This gives users real-time visibility into multi-agent verification. If any dimension scores below the security_minimum threshold (default 5.0), flag it as a blocker immediately — the user can terminate early without waiting for remaining agents.

Monitor + Partial Results (CC 2.1.98)

Use Monitor for streaming test execution output from background scripts:

python
# Stream test output in real-time instead of waiting for completion
Bash(command="npm test 2>&1", run_in_background=true)
Monitor(pid=test_task_id)  # Each line → notification

Partial results (CC 2.1.98): If a verification agent fails mid-analysis, synthesize partial scores rather than re-spawning:

python
for agent_result in verification_results:
    if "[PARTIAL RESULT]" in agent_result.output:
        # Extract whatever scores the agent produced before crashing
        partial_score = parse_score(agent_result.output)  # May be incomplete
        scores[agent_result.dimension] = {
            "score": partial_score, "partial": True,
            "note": "Agent crashed — score based on partial analysis"
        }
        # A 4-dimension score is better than no score. Do NOT re-spawn.

Phase 2.5: Visual Capture (NEW — runs in parallel with Phase 2)

Load details: Read("${CLAUDE_SKILL_DIR}/references/visual-capture.md") for auto-detection, route discovery, screenshot capture, and AI vision evaluation.

Summary: Auto-detects project framework, starts dev server, discovers routes, uses agent-browser to screenshot each route, evaluates with Claude vision, generates self-contained gallery.html with base64-embedded images.

Output: verification-output/{timestamp}/gallery.html — open in browser to see all screenshots with AI evaluations, scores, and annotation diffs.

Graceful degradation: If no frontend detected or server won't start, skips visual capture with a warning — never blocks verification.

Phase 8.5: Agentation Visual Feedback (opt-in)

Load details: Read("${CLAUDE_SKILL_DIR}/references/visual-capture.md") (Phase 8.5 section) for agentation loop workflow.

Trigger: Only when agentation MCP is configured. Offers user the choice to annotate the live UI. ui-feedback agent processes annotations, re-screenshots show before/after.


Grading & Scoring

Load Read("${CLAUDE_PLUGIN_ROOT}/skills/quality-gates/references/unified-scoring-framework.md") for dimensions, weights, grade thresholds, and improvement prioritization. Load Read("${CLAUDE_SKILL_DIR}/references/quality-model.md") for verify-specific extensions (Visual dimension). Load Read("${CLAUDE_SKILL_DIR}/references/grading-rubric.md") for per-agent scoring criteria.


Evidence & Test Execution

Load details: Read("${CLAUDE_SKILL_DIR}/rules/evidence-collection.md") for git commands, test execution patterns, metrics tracking, and post-verification feedback.


Policy-as-Code

Load details: Read("${CLAUDE_SKILL_DIR}/references/policy-as-code.md") for configuration.

Define verification rules in .claude/policies/verification-policy.json:

json
{
  "thresholds": {
    "composite_minimum": 6.0,
    "security_minimum": 7.0,
    "coverage_minimum": 70
  },
  "blocking_rules": [
    {"dimension": "security", "below": 5.0, "action": "block"}
  ]
}

Report Format

Load details: Read("${CLAUDE_SKILL_DIR}/references/report-template.md") for full format. Summary:

markdown
# Feature Verification Report

**Composite Score: [N.N]/10** (Grade: [LETTER])

## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**

References

Load on demand with Read("${CLAUDE_SKILL_DIR}/references/<file>"):

File Content
verification-phases.md 8-phase workflow, agent spawn definitions, Agent Teams mode
visual-capture.md Phase 2.5 + 8.5: screenshot capture, AI vision, gallery generation, agentation loop
quality-model.md Scoring dimensions and weights (8 unified)
grading-rubric.md Per-agent scoring criteria
report-template.md Full report format with visual evidence section
alternative-comparison.md Approach comparison template
orchestration-mode.md Agent Teams vs Task Tool
policy-as-code.md Verification policy configuration
verification-checklist.md Pre-flight checklist

Rules

Load on demand with Read("${CLAUDE_SKILL_DIR}/rules/<file>"):

File Content
scoring-rubric.md Composite scoring, grades, verdicts
evidence-collection.md Evidence gathering and test patterns

Verification Gate (Cross-Cutting)

Load Read("${CLAUDE_PLUGIN_ROOT}/skills/shared/rules/verification-gate.md") — the minimum 5-step gate that applies to ALL completion claims across all skills. This is non-negotiable: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.

Anti-Sycophancy Protocol

Load Read("${CLAUDE_PLUGIN_ROOT}/skills/shared/rules/anti-sycophancy.md") — all verification agents report findings directly without performative agreement. "Should be fine" is not evidence. "Tests pass (exit 0, 47/47)" is.

Agent Status Protocol

All verification agents MUST report using the standardized protocol: Read("${CLAUDE_PLUGIN_ROOT}/agents/shared/status-protocol.md"). Never report DONE if concerns exist. Never silently produce work you're unsure about.


Agent Coordination

SendMessage (Cross-Agent Findings)

When a security agent finds a critical issue, share it with other verification agents:

python
SendMessage(to="test-generator", message="Security: SQL injection in user_service.py:88 — add parameterized query test")
SendMessage(to="code-quality-reviewer", message="Security finding at user_service.py:88 — flag in review")

Skill Chain

After verification, chain to commit if all gates pass:

python
TaskCreate(subject="Commit verified changes", activeForm="Committing", addBlockedBy=[verify_task_id])
# Then: /ork:commit

Related Skills

  • ork:implement - Full implementation with verification
  • ork:review-pr - PR-specific verification
  • testing-unit / testing-integration / testing-e2e - Test execution patterns
  • ork:quality-gates - Quality gate patterns
  • browser-tools - Browser automation for visual capture

Version: 4.2.0 (March 2026) — Added progressive output for incremental agent scores

Expand your agent's capabilities with these related and highly-rated skills.

yonatangross/orchestkit

expect

Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.

143 15
Explore
yonatangross/orchestkit

github-operations

GitHub CLI operations for issues, PRs, milestones, and Projects v2. Covers gh commands, REST API patterns, and automation scripts. Use when managing GitHub issues, PRs, milestones, or Projects with gh.

143 15
Explore
yonatangross/orchestkit

chain-patterns

Chain patterns for CC 2.1.71 pipelines — MCP detection, handoff files, checkpoint-resume, worktree agents, CronCreate monitoring. Use when building multi-phase pipeline skills. Loaded via skills: field by pipeline skills (fix-issue, implement, brainstorm, verify). Not user-invocable.

143 15
Explore
yonatangross/orchestkit

storybook-mcp-integration

Storybook MCP server integration for component-aware AI development. Covers 6 tools across 3 toolsets (dev, docs, testing): component discovery via list-all-documentation/get-documentation, story previews via preview-stories, and automated testing via run-story-tests. Use when generating components that should reuse existing Storybook components, running component tests via MCP, or previewing stories in chat.

143 15
Explore
yonatangross/orchestkit

component-search

Search 21st.dev component registry for production-ready React components. Finds components by natural language description, filters by framework and style system, returns ranked results with install instructions. Use when looking for UI components, finding alternatives to existing components, or sourcing design system building blocks.

143 15
Explore
yonatangross/orchestkit

ai-ui-generation

AI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.

143 15
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results