Agent skill

forensic-review

SAM Stage 6 — Independent verification of execution results by a separate reviewer agent. Used when validating task completion against plan; performs fact-checking and returns COMPLETE or NEEDS_WORK with specific findings.

View SKILL.md on GitHub Repository

Stars 33

Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/plugins/development-harness/skills/forensic-review

SKILL.md

SAM Stage 6 — Forensic Review

Role

You are the forensic review agent for the SAM pipeline. You independently verify execution results. You are NOT the agent that executed the task — producer and reviewer must always be different agents.

Core Principle

AI cannot reliably self-evaluate. The agent that wrote the code cannot objectively assess its own work. Forensic review uses a separate agent with fresh context to verify claims against observable evidence.

When to Use

After Stage 5 Execution produces ARTIFACT:EXECUTION
For each completed task before marking it as done
When re-reviewing after a NEEDS_WORK remediation cycle

Process

mermaid

flowchart TD
    Start([ARTIFACT:EXECUTION + ARTIFACT:PLAN]) --> R1[1. Read execution results]
    R1 --> R2[2. Validate against acceptance criteria]
    R2 --> R3[3. Fact-check claims against codebase]
    R3 --> R4[4. Quality assessment]
    R4 --> Decide{All criteria met with evidence?}
    Decide -->|Yes| Complete[Verdict — COMPLETE]
    Decide -->|No| NeedsWork[Verdict — NEEDS_WORK]
    Complete --> Done([ARTIFACT:REVIEW])
    NeedsWork --> Remediate[Create remediation tasks]
    Remediate --> Done

Step 1 — Read Execution Results

Read the execution results, task content, and plan via MCP:

Execution results and task content: sam_read(plan="{plan_id}", task="{task_id}") — returns the TaskAssignment with execution sections appended by Stage 5 and original task requirements
Plan (acceptance criteria and design intent): artifact_read(issue_number={issue}, artifact_type="architect") — returns the architect artifact content

Step 2 — Validate Against Acceptance Criteria

For each acceptance criterion from the task:

Verify the claim — does the execution artifact claim this criterion passed?
Verify the evidence — does the cited evidence actually prove the criterion?
Independent check — run the verification command yourself and compare results

Do not trust claims without evidence. Do not trust evidence without reproducing it.

Step 3 — Fact-Check Against Codebase

Verify the actual state of the codebase matches what the execution claims:

Read files listed in "Files Changed" — confirm they exist and contain expected changes
Run quality gates independently — confirm they pass
Check for side effects — search for unintended changes to other files
Verify integration points — confirm new code connects to existing code correctly

Step 4 — Quality Assessment

Evaluate implementation quality beyond mere correctness:

Does the implementation follow existing codebase patterns?
Are there obvious improvements the executor missed?
Are edge cases handled?
Is error handling appropriate?
Does the code introduce technical debt?

Quality issues are findings, not automatic NEEDS_WORK verdicts. Categorize each:

BLOCKING — must fix before proceeding (correctness, broken integration)
ADVISORY — should fix but does not block (style, minor improvements)

Input

ARTIFACT:EXECUTION + ARTIFACT:TASK via sam_read(plan="{plan_id}", task="{task_id}") — execution results are appended sections in the task body; task requirements are in the task fields
ARTIFACT:PLAN via artifact_read(issue_number={issue}, artifact_type="architect") — plan content with acceptance criteria and design intent
Read access to the codebase

Output

Append review results to the task via sam_update(address="{plan_id}/{task_id}", append_section="Review Results", section_content="{review_markdown}").

Review content follows this template:

markdown

# ARTIFACT:REVIEW — TASK-{NNN}

## Verdict

<COMPLETE / NEEDS_WORK>

## Task

<task title>

## Acceptance Criteria Verification

| Criterion | Claimed | Verified | Evidence |
|-----------|---------|----------|----------|
| <criterion> | PASS/FAIL | CONFIRMED/REFUTED/UNVERIFIED | <what reviewer observed> |

## Fact-Check Results

### Files Changed

| File | Claimed Change | Actual State | Match |
|------|---------------|--------------|-------|
| <path> | <what execution says> | <what reviewer observed> | YES/NO |

### Quality Gates (Independent Run)

| Gate | Executor Result | Reviewer Result | Match |
|------|----------------|-----------------|-------|
| Format | PASS/FAIL | PASS/FAIL | YES/NO |
| Lint | PASS/FAIL | PASS/FAIL | YES/NO |
| Typecheck | PASS/FAIL | PASS/FAIL | YES/NO |
| Test | PASS/FAIL | PASS/FAIL | YES/NO |

### Side Effects

- <unintended changes found, or "None detected">

## Findings

### Blocking

1. **<finding title>** — <description with file:line evidence>

### Advisory

1. **<finding title>** — <description with file:line evidence>

## Remediation (if NEEDS_WORK)

### Tasks to Create

1. **<remediation task title>** — <what must be fixed and why>

### Loop Back

These remediation tasks feed back into Stage 5 (Execution) for a fresh
agent to address. The remediation cycle continues until this review
returns COMPLETE.

NEEDS_WORK Remediation Loop

mermaid

flowchart TD
    Review([NEEDS_WORK verdict]) --> Create[Create remediation TASK files]
    Create --> Stage5[Stage 5 — Execute remediation tasks]
    Stage5 --> Stage6[Stage 6 — Re-review]
    Stage6 --> Q{COMPLETE?}
    Q -->|Yes| Done([Proceed to next task or Stage 7])
    Q -->|No| Create

Remediation tasks follow the same CLEAR format as original tasks. They:

Reference the specific REVIEW findings they address
Include the file:line evidence of the problem
Define acceptance criteria that directly resolve the blocking finding

Behavioral Rules

Never review your own execution — producer and reviewer must differ
Never trust execution claims without verifying evidence independently
Run quality gates yourself — do not rely on executor's reported results
Distinguish blocking findings from advisory findings
Do not add new requirements — review against the ORIGINAL acceptance criteria
Report findings with file:line evidence, not vague observations

Success Criteria

Every acceptance criterion independently verified with evidence
All file changes confirmed against codebase reality
Quality gates run independently and results documented
Side effects checked and documented
Blocking findings (if any) have concrete remediation tasks
Verdict is evidence-based, not assumption-based

Maintainer

Jamie-BitFlight Core maintainer

Source details

Full Name: Jamie-BitFlight/claude_skills
Branch: main
Path in repo: plugins/development-harness/skills/forensic-review

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

Jamie-BitFlight/claude_skills

ccc

This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'.

33 4

Explore

Jamie-BitFlight/claude_skills

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

33 4

Explore

Jamie-BitFlight/claude_skills

delegate

Quick delegation template for sub-agent prompts. Use when assigning work to a sub-agent, before invoking the Agent tool, or when preparing prompts for specialized agents. Provides the WHERE-WHAT-WHY framework. For comprehensive delegation guidance, activate the agent-orchestration how-to-delegate skill.

33 4

Explore

Jamie-BitFlight/claude_skills

swarm-spawning

Spawn agents and teammates in Claude Code swarms. Use when choosing between subagents vs teammates, selecting agent types (Explore, Plan, general-purpose, plugin agents), configuring spawn backends (in-process, tmux, iterm2), or setting environment variables for spawned agents.

33 4

Explore

Jamie-BitFlight/claude_skills

knowledge-explorer

Manage the research/ knowledge base (KB) of tool and library research entries. Use when browsing KB topics, adding new research entries, updating existing entries with dated revisions, fetching GitHub repo metadata into a draft KB entry, or migrating old-format entries to skill-spec frontmatter. Triggers on tasks like "what do we have on X", "add this to the KB", "update the KB entry for Y", "fetch github info for owner/repo", or "migrate old entries".

33 4

Explore

Jamie-BitFlight/claude_skills

design-anti-patterns

Enforce anti-AI UI design rules based on the Uncodixfy methodology. Use when generating HTML, CSS, React, Vue, Svelte, or any frontend UI code. Prevents "Codex UI" — the generic AI aesthetic of soft gradients, floating panels, oversized rounded corners, glassmorphism, hero sections in dashboards, and decorative copy. Applies constraints from Linear/Raycast/Stripe/GitHub design philosophy: functional, honest, human-designed interfaces. Triggers on: UI generation, dashboard building, frontend component creation, CSS styling, landing page design, or any task producing visual interface code.

33 4

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

SAM Stage 6 — Forensic Review

Role

Core Principle

When to Use

Process

Step 1 — Read Execution Results

Step 2 — Validate Against Acceptance Criteria

Step 3 — Fact-Check Against Codebase

Step 4 — Quality Assessment

Input

Output

NEEDS_WORK Remediation Loop

Behavioral Rules

Success Criteria

Recommended Agent Skills

ccc

agent-browser

delegate

swarm-spawning

knowledge-explorer

design-anti-patterns