Agent skill
verify
Rigorous self-assessment checklist before marking any task as complete. Use when about to claim task completion, before final commit, when user asks "is it done?", or when transitioning from implementation to reporting. Prevents premature completion claims by requiring evidence for every assertion.
Install this agent skill to your Project
npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/.claude/skills/verify
SKILL.md
Verification Protocol
Workflow Reference: See Master Workflow for how this skill fits into the verification stage of the agentic workflow.
STOP. You are NOT done yet. Generate this checklist and provide EVIDENCE for every item.
1. Task Type & Strategy
- Type: FIX / FEATURE / REFACTOR / DOCS / INVESTIGATION
- Strategy: Executable verification vs. Static verification?
2. The "WORKS" Check
flowchart TD
Start(["Begin WORKS Check -- Section 2"]) --> Q{"Task type?"}
Q -->|"Executable code -- compiled, scripted, or CLI-run"| A1["Execution check<br>Terminal output showing successful run<br>(exit code 0 is NOT enough)"]
Q -->|"Static asset -- docs, configs, analysis"| B1["Accuracy check<br>Verified against source code or schema?"]
A1 --> A2["Real data check<br>Ran changed code path against real data<br>not just read the diff?"]
A2 --> A3["Regression check<br>Evidence that existing tests still pass?"]
A3 --> A4["Edge case check<br>Evidence of testing failure scenarios?"]
A4 --> AEvidence["Record code evidence<br>execution output, real data test,<br>test results, edge case result"]
B1 --> B2["Clarity check<br>Follows the established format?"]
B2 --> B3["Validity check<br>Links and references resolve?"]
B3 --> BEvidence["Record static evidence<br>accuracy check method,<br>format standard, link validation method"]
AEvidence --> Done(["WORKS Check complete -- proceed to Section 3"])
BEvidence --> Done
A. For Code (Executable)
- Execution: Terminal output showing successful run? (Exit code 0 is NOT enough)
- Real data: Ran the changed code path against real data, not just read the diff?
- Regression: Evidence that existing tests still pass?
- Edge Cases: Evidence of testing failure scenarios?
EVIDENCE:
- Execution output: [paste actual output]
- Real data test: [command run, input used, output observed]
- Test results: [paste test output]
- Edge case tested: [describe scenario and result]
B. For Static Assets (Docs, Configs, Analysis)
- Accuracy: Verified against source code/schema?
- Clarity: Does it follow the established format?
- Validity: Do links/references resolve?
EVIDENCE:
- Accuracy check: [how verified]
- Format compliance: [standard followed]
- Links validated: [method used]
3. The "FIXED" Check
For bug fixes specifically:
- Reproduction: Did I observe the pre-fix state?
- Resolution: Does the original problem NO LONGER occur?
EVIDENCE:
- Pre-fix behavior: [what was observed]
- Post-fix behavior: [what is now observed]
- Regression test added: [yes/no, location]
4. Quality Gates
- Pre-commit hooks passed?
- Linting passed? (Necessary, but not sufficient)
- Type checking passed? (if applicable)
EVIDENCE:
- Pre-commit: [output or "not configured"]
- Linting: [tool and result]
- Type check: [tool and result]
5. Proportional Response Check
If the task has an issue-classification field in its metadata, verify the response matched the issue type. If no issue-classification is present, mark N/A and proceed.
flowchart TD
Start(["Begin Proportional Response Check"]) --> Q1{"issue-classification<br>present in task metadata?"}
Q1 -->|"absent"| Skip["SKIP -- existing WORKS/FIXED/Quality Gates apply"]
Q1 -->|"present"| Q2{"Classification type?"}
Q2 -->|"procedural"| P["Sweep completeness<br>Codebase search returns zero<br>remaining instances of the pattern"]
Q2 -->|"defect"| D["Root cause addressed<br>Fix targets root cause from evidence chain<br>+ scenario in scenario-target succeeds"]
Q2 -->|"recurring-pattern"| R["Guardrail added<br>New gate/check exists AND<br>covers the defect CLASS not just instance"]
Q2 -->|"missing-guardrail"| M["Gate gap filled<br>Guardrail triggers in the<br>exposing scenario"]
Q2 -->|"unbounded-design"| U["Design implemented<br>Matches chosen direction +<br>trade-offs documented"]
P --> Evidence
D --> Evidence
R --> Evidence
M --> Evidence
U --> Evidence
Skip --> Done(["Proportional Check complete"])
Evidence["Record proportional evidence"] --> Done
EVIDENCE:
- Issue Classification: [type or "not classified"]
- Scenario Target: [scenario -> improvement, or "not specified"]
- Proportional Check: [PASS/FAIL/N/A]
- Check detail: [what was verified and result]
6. Agent Delegation Verification
When work was delegated to a sub-agent, the agent's success report is NOT evidence.
- VCS diff reviewed:
git diffshows the expected changes? - Changes verified: Read the modified files — content matches intent?
- Tests run independently: Ran the verification command yourself, not trusting the agent's claim?
EVIDENCE:
- Agent report: [what agent claimed]
- VCS diff: [files changed, scope matches expectation]
- Independent verification: [command run, output observed]
If no agents were used, mark N/A and proceed.
7. Honesty Check
- Did I verify the full scope?
- Am I distinguishing between "should work" and "verified to work"?
- Destination check: Did I read the target state after writing? (Tool output claiming success is not evidence — the state of the destination is.)
- Can I answer YES to: "I have VALIDATED this output in its intended context"?
Rationalization Prevention
If any of these thoughts occur, STOP and run the verification command:
| Rationalization | Response |
|---|---|
| "Should work now" | Run the verification command |
| "I'm confident" | Confidence is not evidence |
| "Just this once" | No exceptions |
| "Linter passed so build passes" | Linter does not check compilation |
| "Agent said success" | Verify independently (Section 6) |
| "I'm tired" | Exhaustion is not an excuse |
| "Partial check is enough" | Partial check proves nothing about the whole |
| "Different words so rule doesn't apply" | Spirit over letter |
Red flags in your own output — if you catch yourself writing any of these, the gate has not been passed:
- "should", "seems to", "looks correct"
- Expressions of satisfaction before verification ("Done!", "Perfect!")
- About to commit/push/PR without fresh command output in this message
The Golden Rule
If you cannot demonstrate it working in practice with evidence, it is NOT done.
| Claim | Required Evidence |
|---|---|
| "Code works" | Terminal output showing execution against real data |
| "Tests pass" | Actual test output, not assumption |
| "Bug fixed" | Before/after comparison |
| "Data synced" | Read the destination after writing — not the tool output |
| "Docs accurate" | Cross-reference with source |
| "Config valid" | Validation command output |
| "Root cause fixed" | Evidence chain from grooming + fix addresses root cause claim |
| "Guardrail added" | New gate/check exists and triggers in exposing scenario |
| "Agent completed" | VCS diff reviewed + independent verification command run |
Quick Reference
VERIFICATION SUMMARY:
Task Type: [FIX/FEATURE/REFACTOR/DOCS/INVESTIGATION]
Works Check: [PASS/FAIL] - Evidence: ___
Fixed Check: [PASS/FAIL/N/A] - Evidence: ___
Proportional Check: [PASS/FAIL/N/A] - Evidence: ___
Quality Gates: [PASS/FAIL] - Evidence: ___
Agent Delegation: [PASS/FAIL/N/A] - Evidence: ___
Honesty Check: [PASS/FAIL]
VERDICT: [COMPLETE / NOT COMPLETE - reason]
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ccc
This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'.
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
delegate
Quick delegation template for sub-agent prompts. Use when assigning work to a sub-agent, before invoking the Agent tool, or when preparing prompts for specialized agents. Provides the WHERE-WHAT-WHY framework. For comprehensive delegation guidance, activate the agent-orchestration how-to-delegate skill.
swarm-spawning
Spawn agents and teammates in Claude Code swarms. Use when choosing between subagents vs teammates, selecting agent types (Explore, Plan, general-purpose, plugin agents), configuring spawn backends (in-process, tmux, iterm2), or setting environment variables for spawned agents.
knowledge-explorer
Manage the research/ knowledge base (KB) of tool and library research entries. Use when browsing KB topics, adding new research entries, updating existing entries with dated revisions, fetching GitHub repo metadata into a draft KB entry, or migrating old-format entries to skill-spec frontmatter. Triggers on tasks like "what do we have on X", "add this to the KB", "update the KB entry for Y", "fetch github info for owner/repo", or "migrate old entries".
design-anti-patterns
Enforce anti-AI UI design rules based on the Uncodixfy methodology. Use when generating HTML, CSS, React, Vue, Svelte, or any frontend UI code. Prevents "Codex UI" — the generic AI aesthetic of soft gradients, floating panels, oversized rounded corners, glassmorphism, hero sections in dashboards, and decorative copy. Applies constraints from Linear/Raycast/Stripe/GitHub design philosophy: functional, honest, human-designed interfaces. Triggers on: UI generation, dashboard building, frontend component creation, CSS styling, landing page design, or any task producing visual interface code.
Didn't find tool you were looking for?