Agent skill
deep-review
Multi-reviewer code review. Spawns domain-specific reviewers in parallel, cross-checks findings, posts a single structured GitHub review.
Install this agent skill to your Project
npx add-skill https://github.com/coder/coder/tree/main/.agents/skills/deep-review
SKILL.md
Deep Review
Multi-reviewer code review. Spawns domain-specific reviewers in parallel, cross-checks their findings for contradictions and convergence, then posts a single structured GitHub review with inline comments.
When to use this skill
- PRs touching 3+ subsystems, >500 lines, or requiring domain-specific expertise (security, concurrency, database).
- When you want independent perspectives cross-checked against each other, not just a single-pass review.
Use .claude/skills/code-review/ for focused single-domain changes or quick single-pass reviews.
Prerequisite: This skill requires the ability to spawn parallel subagents. If your agent runtime cannot spawn subagents, use code-review instead.
Severity scales: Deep-review uses P0–P4 (consequence-based). Code-review uses 🔴🟡🔵. Both are valid; they serve different review depths. Approximate mapping: P0–P1 ≈ 🔴, P2 ≈ 🟡, P3–P4 ≈ 🔵.
When NOT to use this skill
- Docs-only or config-only PRs (no code to structurally review). Use
.claude/skills/doc-check/instead. - Single-file changes under ~50 lines.
- The PR author asked for a quick review.
0. Proportionality check
Estimate scope before committing to a deep review. If the PR has fewer than 3 files and fewer than 100 lines changed, suggest code-review instead. If the PR is docs-only, suggest doc-check. Proceed only if the change warrants multi-reviewer analysis.
1. Scope the change
Author independence. Review with the same rigor regardless of who authored the PR. Don't soften findings because the author is the person who invoked this review, a maintainer, or a senior contributor. Don't harden findings because the author is a new contributor. The review's value comes from honest, consistent assessment.
Create the review output directory before anything else:
export REVIEW_DIR="/tmp/deep-review/$(date +%s)"
mkdir -p "$REVIEW_DIR"
Re-review detection. Check if you or a previous agent session already reviewed this PR:
gh pr view {number} --json reviews --jq '.reviews[] | select(.body | test("P[0-4]|\\*\\*Obs\\*\\*|\\*\\*Nit\\*\\*")) | .submittedAt' | head -1
If a prior agent review exists, you must produce a prior-findings classification table before proceeding. This is not optional — the table is an input to step 3 (reviewer prompts). Without it, reviewers will re-discover resolved findings.
- Read every author response since the last review (inline replies, PR comments, commit messages).
- Diff the branch to see what changed since the last review.
- Engage with any author questions before re-raising findings.
- Write
$REVIEW_DIR/prior-findings.mdwith this format:
# Prior findings from round {N}
| Finding | Author response | Status |
|---------|----------------|--------|
| P1 `file.go:42` wire-format break | Acknowledged, pushed fix in abc123 | Resolved |
| P2 `handler.go:15` missing auth check | "Middleware handles this" — see comment | Contested |
| P3 `db.go:88` naming | Agreed, will fix | Acknowledged |
Classify each finding as:
- Resolved: author pushed a code fix. Verify the fix addresses the finding's specific concern — not just that code changed in the relevant area. Check that the fix doesn't introduce new issues.
- Acknowledged: author agreed but deferred.
- Contested: author disagreed or raised a constraint. Write their argument in the table.
- No response: author didn't address it.
Only Contested and No response findings carry forward to the new review. Resolved and Acknowledged findings must not be re-raised.
Scope the diff. Get the file list from the diff, PR, or user. Skim for intent and note which layers are touched (frontend, backend, database, auth, concurrency, tests, docs).
For each changed file, briefly check the surrounding context:
- Config files (package.json, tsconfig, vite.config, etc.): scan the existing entries for naming conventions and structural patterns.
- New files: check if an existing file could have been extended instead.
- Comments in the diff: do they explain why, or just restate what the code does?
2. Pick reviewers
Match reviewer roles to layers touched. The Test Auditor, Edge Case Analyst, and Contract Auditor always run. Conditional reviewers activate when their domain is touched.
Tier 1 — Structural reviewers
| Role | Focus | When |
|---|---|---|
| Test Auditor | Test authenticity, missing cases, readability | Always |
| Edge Case Analyst | Chaos testing, edge cases, hidden connections | Always |
| Contract Auditor | Contract fidelity, lifecycle completeness, semantic honesty | Always |
| Structural Analyst | Implicit assumptions, class-of-bug elimination | API design, type design, test structure, resource lifecycle |
| Performance Analyst | Hot paths, resource exhaustion, allocation patterns | Hot paths, loops, caches, resource lifecycle |
| Database Reviewer | PostgreSQL, data modeling, Go↔SQL boundary | Migrations, queries, schema, indexes |
| Security Reviewer | Auth, attack surfaces, input handling | Auth, new endpoints, input handling, tokens, secrets |
| Product Reviewer | Over-engineering, feature justification | New features, new config surfaces |
| Frontend Reviewer | UI state, render lifecycles, component design | Frontend changes, UI components, API response shape changes |
| Duplication Checker | Existing utilities, code reuse | New files, new helpers/utilities, new types or components |
| Go Architect | Package boundaries, API lifecycle, middleware | Go code, API design, middleware, package boundaries |
| Concurrency Reviewer | Goroutines, channels, locks, shutdown | Goroutines, channels, locks, context cancellation, shutdown |
Tier 2 — Nit reviewers
| Role | Focus | File filter |
|---|---|---|
| Modernization Reviewer | Language-level improvements, stdlib patterns | Per-language (see below) |
| Style Reviewer | Naming, comments, consistency | *.go *.ts *.tsx *.py *.sh |
Tier 2 file filters:
-
Modernization Reviewer: one instance per language present in the diff. Filter by extension:
- Go:
*.go— reference.claude/docs/GO.mdbefore reviewing. - TypeScript:
*.ts*.tsx: reference.agents/skills/deep-review/references/typescript.mdbefore reviewing. - React:
*.tsx*.jsx: reference.agents/skills/deep-review/references/react.mdbefore reviewing.
.tsxfiles match both TypeScript and React filters. Spawn both instances when the diff contains.tsxchanges — TS covers language-level patterns; React covers component and hooks patterns. Before spawning, verify each instance's filter produces a non-empty diff. Skip instances whose filtered diff is empty. - Go:
-
Style Reviewer:
*.go*.ts*.tsx*.py*.sh
3. Spawn reviewers
Each reviewer writes findings to $REVIEW_DIR/{role-name}.md where {role-name} is the kebab-cased role name (e.g. test-auditor, go-architect). For Modernization Reviewer instances, qualify with the language: modernization-reviewer-go.md, modernization-reviewer-ts.md, modernization-reviewer-react.md. The orchestrator does not read reviewer findings from the subagent return text — it reads the files in step 4.
Spawn all Tier 1 and Tier 2 reviewers in parallel. Give each reviewer a reference (PR number, branch name), not the diff content. The reviewer fetches the diff itself. Reviewers are read-only — no worktrees needed.
Tier 1 prompt:
Read `AGENTS.md` in this repository before starting.
You are the {Role Name} reviewer. Read your methodology in
`.agents/skills/deep-review/roles/{role-name}.md`.
Follow the review instructions in
`.agents/skills/deep-review/structural-reviewer-prompt.md`.
Review: {PR number / branch / commit range}.
Output file: {REVIEW_DIR}/{role-name}.md
Tier 2 prompt:
Read `AGENTS.md` in this repository before starting.
You are the {Role Name} reviewer. Read your methodology in
`.agents/skills/deep-review/roles/{role-name}.md`.
Follow the review instructions in
`.agents/skills/deep-review/nit-reviewer-prompt.md`.
Review: {PR number / branch / commit range}.
File scope: {filter from step 2}.
Output file: {REVIEW_DIR}/{role-name}.md
For Modernization Reviewer instances, add the language reference after the methodology line:
- Go:
Read .claude/docs/GO.md as your Go language reference before reviewing. - TypeScript:
Read .agents/skills/deep-review/references/typescript.md as your TypeScript language reference before reviewing. - React:
Read .agents/skills/deep-review/references/react.md as your React language reference before reviewing.
For re-reviews, append to both Tier 1 and Tier 2 prompts:
Prior findings and author responses are in {REVIEW_DIR}/prior-findings.md. Read it before reviewing. Do not re-raise Resolved or Acknowledged findings.
4. Cross-check findings
4a. Read findings from files
Read each reviewer's output file from $REVIEW_DIR/ one at a time. One file per read — do not batch multiple reviewer files in parallel. Batching causes reviewer voices to blend in the context window, leading to misattribution (grabbing phrasing from one reviewer and attributing it to another).
For each file:
- Read the file.
- List each finding with its severity, location, and one-line summary.
- Note the reviewer's exact evidence line for each finding.
If a file says "No findings," record that and move on. If a file is missing (reviewer crashed or timed out), note the gap and proceed — do not stall or silently drop the reviewer's perspective.
After reading all files, you have a finding inventory. Proceed to cross-check.
4b. Cross-check
Handle Tier 1 and Tier 2 findings separately before merging.
Tier 2 nit findings: Apply a lighter filter. Drop nits that are purely subjective, that duplicate what a linter already enforces, or that the author clearly made intentionally. Keep nits that have a practical benefit (clearer name, better error message, obsolete stdlib usage). Surviving nits stay as Nit.
Tier 1 structural findings: Before producing the final review, look across all findings for:
- Contradictions. Two reviewers recommending opposite approaches. Flag both and note the conflict.
- Interactions. One finding that solves or worsens another (e.g. a refactor suggestion that addresses a separate cleanup concern). Link them.
- Convergence. Two or more reviewers flagging the same function or component from different angles. Don't just merge at max(severity) and don't treat convergence as headcount ("more reviewers = higher confidence in the same thing"). After listing the convergent findings, trace the consequence chain across them. One reviewer flags a resource leak, another flags an unbounded hang, a third flags infinite retries on reconnect — the combination means a single failure leaves a permanent resource drain with no recovery. That combined consequence may deserve its own finding at higher severity than any individual one.
- Async findings. When a finding mentions setState after unmount, unused cancellation signals, or missing error handling near an await: (1) find the setState or callback, (2) trace what renders or fires as a result, (3) ask "if this fires after the user navigated away, what do they see?" If the answer is "nothing" (a ref update, a console.log), it's P3. If the answer is "a dialog opens" or "state corrupts," upgrade. The severity depends on what's at the END of the async chain, not the start.
- Mechanism vs. consequence. Reviewers describe findings using mechanism vocabulary ("unused parameter", "duplicated code", "test passes by coincidence"), not consequence vocabulary ("dialog opens in wrong view", "attacker can bypass check", "removing this code has no test to catch it"). The Contract Auditor and Structural Analyst tend to frame findings by consequence already — use their framing directly. For mechanism-framed findings from other reviewers, restate the consequence before accepting the severity. Consequences include UX bugs, security gaps, data corruption, and silent regressions — not just things users see on screen.
- Weak evidence. Findings that assert a problem without demonstrating it. Downgrade or drop.
- Unnecessary novelty. New files, new naming patterns, new abstractions where the existing codebase already has a convention. If no reviewer flagged it but you see it, add it. If a reviewer flagged it as an observation, evaluate whether it should be a finding.
- Scope creep. Suggestions that go beyond reviewing what changed into redesigning what exists. Downgrade to P4.
- Structural alternatives. One reviewer proposes a design that eliminates a documented tradeoff, while others have zero findings because the current approach "works." Don't discount this as an outlier or scope creep. A structural alternative that removes the need for a tradeoff can be the highest-value output of the review. Preserve it at its original severity — the author decides whether to adopt it, but they need enough signal to evaluate it.
- Pre-existing behavior. "Pre-existing" doesn't erase severity. Check whether the PR introduced new code (comments, branches, error messages) that describes or depends on the pre-existing behavior incorrectly. The new code is in scope even when the underlying behavior isn't.
For each finding and observation, apply the severity test in both directions. Observations are not exempt — a reviewer may underrate a convention violation or a missing guarantee as Obs when the consequence warrants P3+:
- Downgrade: "Is this actually less severe than stated?"
- Upgrade: "Could this be worse than stated?"
When the severity spread among reviewers exceeds one level, note it explicitly. Only credit reviewers at or above the posted severity. A finding that survived 2+ independent reviewers needs an explicit counter-argument to drop. "Low risk" is not a counter when the reviewers already addressed it in their evidence.
Before forwarding a nit, form an independent opinion on whether it improves the code. Before rejecting a nit, verify you can prove it wrong, not just argue it's debatable.
Drop findings that don't survive this check. Adjust severity where the cross-check changes the picture.
After filtering both tiers, check for overlap: a nit that points at the same line as a Tier 1 finding can be folded into that comment rather than posted separately.
4c. Quoting discipline
When a finding survives cross-check, the reviewer's technical evidence is the source of record. Do not paraphrase it.
Convergent findings — sharpest first. When multiple reviewers flag the same issue:
- Rank the converging findings by evidence quality.
- Start from the sharpest individual finding as the base text.
- Layer in only what other reviewers contributed that the base didn't cover (a concrete detail, a preemptive counter, a stronger framing).
- Attribute to the 2–3 reviewers with the strongest evidence, not all N who noticed the same thing.
Single-reviewer findings. Go back to the reviewer's file and copy the evidence verbatim. The orchestrator owns framing, severity assessment, and practical judgment — those are your words. The technical claim and code-level evidence are the reviewer's words.
A posted finding has two voices:
- Reviewer voice (quoted): the specific technical observation and code evidence exactly as the reviewer wrote it.
- Orchestrator voice (original): severity framing, practical judgment ("worth fixing now because..."), scenario building, and conversational tone.
If you need to adjust a finding's scope (e.g. the reviewer said "file.go:42" but the real issue is broader), say so explicitly rather than silently rewriting the evidence.
Attribution must show severity spread. When reviewers disagree on severity, the attribution should reflect that — not flatten everyone to the posted severity. Show each reviewer's individual severity: *(Security Reviewer P1, Concurrency Reviewer P1, Test Auditor P2)* not *(Security Reviewer, Concurrency Reviewer, Test Auditor)*.
Integrity check. Before posting, verify that quoted evidence in findings actually corresponds to content in the diff. This guards against garbled cross-references from the file-reading step.
5. Post the review
When reviewing a GitHub PR, post findings as a proper GitHub review with inline comments, not a single comment dump.
Review body. Open with a short, friendly summary: what the change does well, what the overall impression is, and how many findings follow. Call out good work when you see it. A review that only lists problems teaches authors to dread your comments.
Clean approach to X. The Y handling is particularly well done.
A couple things to look at: 1 P2, 1 P3, 3 nits across 5 inline
comments.
For re-reviews (round 2+), open with what was addressed:
Thanks for fixing the wire-format break and the naming issue.
Fresh review found one new issue: 1 P2 across 1 inline comment.
Keep the review body to 2–4 sentences. Don't use markdown headers in the body — they render oversized in GitHub's review UI.
Inline comments. Every finding is an inline comment, pinned to the most relevant file and line. For findings that span multiple files, pin to the primary file (GitHub supports file-level comments when position is omitted or set to 1).
Inline comment format:
**P{n}** One-sentence finding *(Reviewer Role)*
> Reviewer's evidence quoted verbatim from their file
Orchestrator's practical judgment: is this worth fixing now, or
is the current tradeoff acceptable? Scenario building, severity
reasoning, fix suggestions — these are your words.
For convergent findings (multiple reviewers, same issue):
**P{n}** One-sentence finding *(Performance Analyst P1,
Contract Auditor P1, Test Auditor P2)*
> Sharpest reviewer's evidence as base text
> *Contract Auditor adds:* Additional detail from their file
Orchestrator's practical judgment.
For observations: **Obs** One-sentence observation *(Role)* ... For nits: **Nit** One-sentence finding *(Role)* ...
P3 findings and observations can be one-liners. Group multiple nits on the same file into one comment when they're co-located.
Review event. Always use COMMENT. Never use REQUEST_CHANGES — this isn't the norm in this repository. Never use APPROVE — approval is a human responsibility.
For P0 or P1 findings, add a note in the review body: "This review contains findings that may need attention before merge."
Posting via GitHub API.
The gh api endpoint for posting reviews routes through GraphQL by default. Field names differ from the REST API docs:
- Use
position(diff-relative line number), notline+side.sideis not a valid field in the GraphQL schema. subject_type: "file"is not recognized. Pin file-level comments toposition: 1instead.- Use
-X POSTwith--inputto force REST API routing.
To compute positions: save the PR diff to a file, then count lines from the first @@ hunk header of each file's diff section. For new files, position = line number + 1 (the hunk header is position 1, first content line is position 2).
gh pr diff {number} > /tmp/pr.diff
Submit:
gh api -X POST \
repos/{owner}/{repo}/pulls/{number}/reviews \
--input review.json
Where review.json:
{
"event": "COMMENT",
"body": "Summary of what's good and what to look at.\n1 P2, 1 P3 across 2 inline comments.",
"comments": [
{
"path": "file.go",
"position": 42,
"body": "**P1** Finding... *(Reviewer Role)*\n\n> Evidence..."
},
{
"path": "other.go",
"position": 1,
"body": "**P2** Cross-file finding... *(Reviewer Role)*\n\n> Evidence..."
}
]
}
Tone guidance. Frame design concerns as questions: "Could we use X instead?" — be direct only for correctness issues. Hedge design, not bugs. Build concrete scenarios to make concerns tangible. When uncertain, say so. See .claude/docs/PR_STYLE_GUIDE.md for PR conventions.
Follow-up
After posting the review, monitor the PR for author responses. If the author pushes fixes or responds to findings, consider running a re-review (this skill, starting from step 1 with the re-review detection path). Allow time for the author to address multiple findings before re-reviewing — don't trigger on each individual response.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
refine-plan
Iteratively refine development plans using TDD methodology. Ensures plans are clear, actionable, and include red-green-refactor cycles with proper test coverage.
pull-requests
Guide for creating, updating, and following up on pull requests in the Coder repository. Use when asked to open a PR, update a PR, rewrite a PR description, or follow up on CI/check failures.
doc-check
Checks if code changes require documentation updates
code-review
Reviews code changes for bugs, security issues, and quality problems
mobile-dev-server-sandbox
Connects Mux mobile (Expo web/native) to an isolated dev-server sandbox with deterministic port setup, backend pairing, and Chrome MCP interaction. Use when implementing or validating mobile features against a sandboxed Mux backend.
dev-desktop-sandbox
Run isolated mux desktop (Electron) instances (temp MUX_ROOT + free ports)
Didn't find tool you were looking for?