Agent skill

devils-advocate

Use when challenging assumptions, surfacing risks, or stress-testing designs and decisions. Triggers: 'challenge this', 'play devil's advocate', 'what could go wrong', 'poke holes', 'find the flaws', 'what am I missing', 'is this solid', 'red team this', 'what are the weaknesses', 'risk assessment', 'sanity check'. Works on design docs, architecture decisions, or any artifact needing adversarial review.

Stars 5
Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/axiomantic/spellbook/tree/main/skills/devils-advocate

SKILL.md

Evidence Hierarchy Reference

This skill follows the shared evidence hierarchy defined in skills/shared-references/evidence-hierarchy.md. Challenges must cite evidence tiers. An assumption flagged as UNVALIDATED must have attempted at least Medium depth verification per the Depth Escalation Protocol.

<RULE>If a finding is UNVALIDATED or IMPLICIT at shallow depth, it MUST be escalated to Medium depth before inclusion in the report.</RULE>

Invariant Principles

  1. Untested assumptions become production bugs. Every claim needs evidence or explicit "unvalidated" flag.
  2. Vague scope enables scope creep. Boundaries must be testable, not interpretive.
  3. Optimistic architecture fails at scale. Every design decision needs 10x/failure/deprecation analysis.
  4. Undocumented failure modes become incidents. Every integration needs explicit failure handling.
  5. Unmeasured success is unfalsifiable. Metrics require numbers, baselines, percentiles.

Applicability

Use Skip (Why)
Understanding/design doc complete Active user discovery (no stable artifact to challenge)
"Challenge this" request Code review (use code-reviewer - different scope)
Before architectural decision Implementation validation (use fact-checking)

Inputs

Input Required Description
document_path Yes Path to understanding or design document to review
focus_areas No Specific areas to prioritize (e.g., "security", "scalability")
known_constraints No Constraints already accepted (skip challenging these)

Outputs

Output Type Description
review_document Inline Structured review following Output Format template
issue_count Inline Summary counts: critical, major, minor
readiness_verdict Inline Verdict per table below

Verdicts

Verdict Meaning
READY Minor or no issues found after thorough review
NEEDS WORK Major issues but fixable
NOT READY Blocking issues
INCONCLUSIVE Insufficient detail in document to assess

A verdict of READY after thorough investigation is valid. Fabricating marginal issues to meet a quota degrades trust.


Review Protocol

Challenge Categories

Category Classification Challenges
Assumptions VALIDATED/UNVALIDATED/IMPLICIT/CONTRADICTORY Evidence sufficient? Current? What if wrong? What disproves?
Scope Vague language? Creep vectors? MVP ship without excluded? Users expect? Similar code supports?
Architecture Rationale specific or generic? 10x scale? System fails? Dep deprecated? Matches codebase?
Integration Interface documented? Stable? System down? Unexpected data? Slow? Auth fails? Circular deps?
Success Criteria Has number? Measurable? Baseline? p50/p95/p99? Monitored how?
Edge Cases Boundary, failure, security Empty/max/invalid? Network/partial/cascade? Auth bypass? Injection?
Vocabulary Overloaded? Matches code? Context-dependent meanings? Synonyms to unify? Two devs interpret same?

Fractal exploration: When a finding is classified as CRITICAL, invoke fractal-thinking with intensity pulse and seed: "What are the second-order consequences if [critical issue] is not addressed?". Use synthesis to add impact chains to CRITICAL findings.

Challenge Template

[ITEM]: "[quoted from doc]"
- Classification: [type]
- Evidence: [provided or NONE]
- What if wrong: [failure impact]
- Similar code: [reference or N/A]
- VERDICT: [finding + recommendation]

Output Format

markdown
# Devil's Advocate Review: [Feature]

## Executive Summary
[2-3 sentences: critical count, major risks, overall assessment]

## Critical Issues (Block Design Phase)

### Issue N: [Title]
- **Category:** [from challenge categories]
- **Finding:** [what is wrong]
- **Evidence:** [doc sections, codebase refs]
- **Impact:** [what breaks]
- **Recommendation:** [specific action]

## Major Risks (Proceed with Caution)

### Risk N: [Title]
[Same format + Mitigation]

## Minor Issues
- [Issue]: [Finding] -> [Recommendation]

## Validation Summary

| Area | Total | Strong | Weak | Flagged |
|------|-------|--------|------|---------|
| Assumptions | N | X | Y | Z |
| Scope | N | justified | - | questionable |
| Architecture | N | well-justified | - | needs rationale |
| Integrations | N | failure documented | - | missing |
| Edge cases | N | covered | - | recommended |

## Overall Assessment
**Readiness:** READY | NEEDS WORK | NOT READY
**Confidence:** HIGH | MEDIUM | LOW
**Blocking Issues:** [N]

Recommendation Validation

For each recommendation:

  1. Verify the recommendation itself is sound (apply it mentally and check for new issues)
  2. Cite evidence tier supporting the recommendation
  3. If recommendation would create new assumptions, flag them

<FORBIDDEN>Proposing a "correction" that has not itself been validated. A wrong recommendation is worse than leaving the original assumption.</FORBIDDEN>

Cross-Category Contradiction Detection

After all categories are challenged, check for contradictions between findings (e.g., Architecture says "fail-safe" but Edge Cases says "data loss"). Report contradictions explicitly in the review output. Contradictions between categories often reveal the deepest design flaws.


Self-Check


<FINAL_EMPHASIS> Every passed assumption = production bug. Every vague requirement = scope creep. Every unexamined edge case = 3am incident. Thorough. Skeptical. Relentless. </FINAL_EMPHASIS>

Expand your agent's capabilities with these related and highly-rated skills.

axiomantic/spellbook

spellbook-auditing

Meta-audit skill for spellbook development. Spawns parallel subagents to factcheck docs, optimize instructions, find token savings, and identify MCP candidates. Produces actionable report.

5 2
Explore
axiomantic/spellbook

documentation-updates

Use after modifying library skills, library commands, or agents to ensure CHANGELOG, README, and docs are updated

5 2
Explore
axiomantic/spellbook

project-encyclopedia

[DEPRECATED] Use project-level AGENTS.md files instead. Previously used for first-session codebase onboarding and persistent glossary creation.

5 2
Explore
axiomantic/spellbook

reviewing-impl-plans

Use when reviewing implementation plans before execution. Triggers: 'is this plan solid', 'review the plan', 'check before I start building', 'anything missing from this plan', 'will this plan work', 'audit the implementation plan'. NOT for: reviewing design documents (use reviewing-design-docs) or creating plans (use writing-plans).

5 2
Explore
axiomantic/spellbook

session-resume

Session resume protocol and session repairs handling. Loaded when spellbook_session_init returns resume_available: true, or when session_init returns a repairs array. Triggers: 'resume', 'continue', 'where were we', session resume, session repairs.

5 2
Explore
axiomantic/spellbook

brainstorming

Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.

5 2
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results