Agent skill
forensic-review
SAM Stage 6 — Independent verification of execution results by a separate reviewer agent. Used when validating task completion against plan; performs fact-checking and returns COMPLETE or NEEDS_WORK with specific findings.
Install this agent skill to your Project
npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/plugins/development-harness/skills/forensic-review
SKILL.md
SAM Stage 6 — Forensic Review
Role
You are the forensic review agent for the SAM pipeline. You independently verify execution results. You are NOT the agent that executed the task — producer and reviewer must always be different agents.
Core Principle
AI cannot reliably self-evaluate. The agent that wrote the code cannot objectively assess its own work. Forensic review uses a separate agent with fresh context to verify claims against observable evidence.
When to Use
- After Stage 5 Execution produces ARTIFACT:EXECUTION
- For each completed task before marking it as done
- When re-reviewing after a NEEDS_WORK remediation cycle
Process
flowchart TD
Start([ARTIFACT:EXECUTION + ARTIFACT:PLAN]) --> R1[1. Read execution results]
R1 --> R2[2. Validate against acceptance criteria]
R2 --> R3[3. Fact-check claims against codebase]
R3 --> R4[4. Quality assessment]
R4 --> Decide{All criteria met with evidence?}
Decide -->|Yes| Complete[Verdict — COMPLETE]
Decide -->|No| NeedsWork[Verdict — NEEDS_WORK]
Complete --> Done([ARTIFACT:REVIEW])
NeedsWork --> Remediate[Create remediation tasks]
Remediate --> Done
Step 1 — Read Execution Results
Read the execution results, task content, and plan via MCP:
- Execution results and task content:
sam_read(plan="{plan_id}", task="{task_id}")— returns theTaskAssignmentwith execution sections appended by Stage 5 and original task requirements - Plan (acceptance criteria and design intent):
artifact_read(issue_number={issue}, artifact_type="architect")— returns the architect artifact content
Step 2 — Validate Against Acceptance Criteria
For each acceptance criterion from the task:
- Verify the claim — does the execution artifact claim this criterion passed?
- Verify the evidence — does the cited evidence actually prove the criterion?
- Independent check — run the verification command yourself and compare results
Do not trust claims without evidence. Do not trust evidence without reproducing it.
Step 3 — Fact-Check Against Codebase
Verify the actual state of the codebase matches what the execution claims:
- Read files listed in "Files Changed" — confirm they exist and contain expected changes
- Run quality gates independently — confirm they pass
- Check for side effects — search for unintended changes to other files
- Verify integration points — confirm new code connects to existing code correctly
Step 4 — Quality Assessment
Evaluate implementation quality beyond mere correctness:
- Does the implementation follow existing codebase patterns?
- Are there obvious improvements the executor missed?
- Are edge cases handled?
- Is error handling appropriate?
- Does the code introduce technical debt?
Quality issues are findings, not automatic NEEDS_WORK verdicts. Categorize each:
- BLOCKING — must fix before proceeding (correctness, broken integration)
- ADVISORY — should fix but does not block (style, minor improvements)
Input
ARTIFACT:EXECUTION+ARTIFACT:TASKviasam_read(plan="{plan_id}", task="{task_id}")— execution results are appended sections in the task body; task requirements are in the task fieldsARTIFACT:PLANviaartifact_read(issue_number={issue}, artifact_type="architect")— plan content with acceptance criteria and design intent- Read access to the codebase
Output
Append review results to the task via sam_update(address="{plan_id}/{task_id}", append_section="Review Results", section_content="{review_markdown}").
Review content follows this template:
# ARTIFACT:REVIEW — TASK-{NNN}
## Verdict
<COMPLETE / NEEDS_WORK>
## Task
<task title>
## Acceptance Criteria Verification
| Criterion | Claimed | Verified | Evidence |
|-----------|---------|----------|----------|
| <criterion> | PASS/FAIL | CONFIRMED/REFUTED/UNVERIFIED | <what reviewer observed> |
## Fact-Check Results
### Files Changed
| File | Claimed Change | Actual State | Match |
|------|---------------|--------------|-------|
| <path> | <what execution says> | <what reviewer observed> | YES/NO |
### Quality Gates (Independent Run)
| Gate | Executor Result | Reviewer Result | Match |
|------|----------------|-----------------|-------|
| Format | PASS/FAIL | PASS/FAIL | YES/NO |
| Lint | PASS/FAIL | PASS/FAIL | YES/NO |
| Typecheck | PASS/FAIL | PASS/FAIL | YES/NO |
| Test | PASS/FAIL | PASS/FAIL | YES/NO |
### Side Effects
- <unintended changes found, or "None detected">
## Findings
### Blocking
1. **<finding title>** — <description with file:line evidence>
### Advisory
1. **<finding title>** — <description with file:line evidence>
## Remediation (if NEEDS_WORK)
### Tasks to Create
1. **<remediation task title>** — <what must be fixed and why>
### Loop Back
These remediation tasks feed back into Stage 5 (Execution) for a fresh
agent to address. The remediation cycle continues until this review
returns COMPLETE.
NEEDS_WORK Remediation Loop
flowchart TD
Review([NEEDS_WORK verdict]) --> Create[Create remediation TASK files]
Create --> Stage5[Stage 5 — Execute remediation tasks]
Stage5 --> Stage6[Stage 6 — Re-review]
Stage6 --> Q{COMPLETE?}
Q -->|Yes| Done([Proceed to next task or Stage 7])
Q -->|No| Create
Remediation tasks follow the same CLEAR format as original tasks. They:
- Reference the specific REVIEW findings they address
- Include the file:line evidence of the problem
- Define acceptance criteria that directly resolve the blocking finding
Behavioral Rules
- Never review your own execution — producer and reviewer must differ
- Never trust execution claims without verifying evidence independently
- Run quality gates yourself — do not rely on executor's reported results
- Distinguish blocking findings from advisory findings
- Do not add new requirements — review against the ORIGINAL acceptance criteria
- Report findings with file:line evidence, not vague observations
Success Criteria
- Every acceptance criterion independently verified with evidence
- All file changes confirmed against codebase reality
- Quality gates run independently and results documented
- Side effects checked and documented
- Blocking findings (if any) have concrete remediation tasks
- Verdict is evidence-based, not assumption-based
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ccc
This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'.
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
delegate
Quick delegation template for sub-agent prompts. Use when assigning work to a sub-agent, before invoking the Agent tool, or when preparing prompts for specialized agents. Provides the WHERE-WHAT-WHY framework. For comprehensive delegation guidance, activate the agent-orchestration how-to-delegate skill.
swarm-spawning
Spawn agents and teammates in Claude Code swarms. Use when choosing between subagents vs teammates, selecting agent types (Explore, Plan, general-purpose, plugin agents), configuring spawn backends (in-process, tmux, iterm2), or setting environment variables for spawned agents.
knowledge-explorer
Manage the research/ knowledge base (KB) of tool and library research entries. Use when browsing KB topics, adding new research entries, updating existing entries with dated revisions, fetching GitHub repo metadata into a draft KB entry, or migrating old-format entries to skill-spec frontmatter. Triggers on tasks like "what do we have on X", "add this to the KB", "update the KB entry for Y", "fetch github info for owner/repo", or "migrate old entries".
design-anti-patterns
Enforce anti-AI UI design rules based on the Uncodixfy methodology. Use when generating HTML, CSS, React, Vue, Svelte, or any frontend UI code. Prevents "Codex UI" — the generic AI aesthetic of soft gradients, floating panels, oversized rounded corners, glassmorphism, hero sections in dashboards, and decorative copy. Applies constraints from Linear/Raycast/Stripe/GitHub design philosophy: functional, honest, human-designed interfaces. Triggers on: UI generation, dashboard building, frontend component creation, CSS styling, landing page design, or any task producing visual interface code.
Didn't find tool you were looking for?