Agent skill

devils-advocate

Use when challenging assumptions, surfacing risks, or stress-testing designs and decisions. Triggers: 'challenge this', 'play devil's advocate', 'what could go wrong', 'poke holes', 'find the flaws', 'what am I missing', 'is this solid', 'red team this', 'what are the weaknesses', 'risk assessment', 'sanity check'. Works on design docs, architecture decisions, or any artifact needing adversarial review.

View SKILL.md on GitHub Repository

Stars 5

Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/axiomantic/spellbook/tree/main/skills/devils-advocate

SKILL.md

Evidence Hierarchy Reference

This skill follows the shared evidence hierarchy defined in skills/shared-references/evidence-hierarchy.md. Challenges must cite evidence tiers. An assumption flagged as UNVALIDATED must have attempted at least Medium depth verification per the Depth Escalation Protocol.

<RULE>If a finding is UNVALIDATED or IMPLICIT at shallow depth, it MUST be escalated to Medium depth before inclusion in the report.</RULE>

Invariant Principles

Untested assumptions become production bugs. Every claim needs evidence or explicit "unvalidated" flag.
Vague scope enables scope creep. Boundaries must be testable, not interpretive.
Optimistic architecture fails at scale. Every design decision needs 10x/failure/deprecation analysis.
Undocumented failure modes become incidents. Every integration needs explicit failure handling.
Unmeasured success is unfalsifiable. Metrics require numbers, baselines, percentiles.

Applicability

Use	Skip (Why)
Understanding/design doc complete	Active user discovery (no stable artifact to challenge)
"Challenge this" request	Code review (use code-reviewer - different scope)
Before architectural decision	Implementation validation (use fact-checking)

Inputs

Input	Required	Description
`document_path`	Yes	Path to understanding or design document to review
`focus_areas`	No	Specific areas to prioritize (e.g., "security", "scalability")
`known_constraints`	No	Constraints already accepted (skip challenging these)

Outputs

Output	Type	Description
`review_document`	Inline	Structured review following Output Format template
`issue_count`	Inline	Summary counts: critical, major, minor
`readiness_verdict`	Inline	Verdict per table below

Verdicts

Verdict	Meaning
READY	Minor or no issues found after thorough review
NEEDS WORK	Major issues but fixable
NOT READY	Blocking issues
INCONCLUSIVE	Insufficient detail in document to assess

A verdict of READY after thorough investigation is valid. Fabricating marginal issues to meet a quota degrades trust.

Review Protocol

Challenge Categories

Category	Classification	Challenges
Assumptions	VALIDATED/UNVALIDATED/IMPLICIT/CONTRADICTORY	Evidence sufficient? Current? What if wrong? What disproves?
Scope	Vague language? Creep vectors?	MVP ship without excluded? Users expect? Similar code supports?
Architecture	Rationale specific or generic?	10x scale? System fails? Dep deprecated? Matches codebase?
Integration	Interface documented? Stable?	System down? Unexpected data? Slow? Auth fails? Circular deps?
Success Criteria	Has number? Measurable?	Baseline? p50/p95/p99? Monitored how?
Edge Cases	Boundary, failure, security	Empty/max/invalid? Network/partial/cascade? Auth bypass? Injection?
Vocabulary	Overloaded? Matches code?	Context-dependent meanings? Synonyms to unify? Two devs interpret same?

Fractal exploration: When a finding is classified as CRITICAL, invoke fractal-thinking with intensity pulse and seed: "What are the second-order consequences if [critical issue] is not addressed?". Use synthesis to add impact chains to CRITICAL findings.

Challenge Template

[ITEM]: "[quoted from doc]"
- Classification: [type]
- Evidence: [provided or NONE]
- What if wrong: [failure impact]
- Similar code: [reference or N/A]
- VERDICT: [finding + recommendation]

Output Format

markdown

# Devil's Advocate Review: [Feature]

## Executive Summary
[2-3 sentences: critical count, major risks, overall assessment]

## Critical Issues (Block Design Phase)

### Issue N: [Title]
- **Category:** [from challenge categories]
- **Finding:** [what is wrong]
- **Evidence:** [doc sections, codebase refs]
- **Impact:** [what breaks]
- **Recommendation:** [specific action]

## Major Risks (Proceed with Caution)

### Risk N: [Title]
[Same format + Mitigation]

## Minor Issues
- [Issue]: [Finding] -> [Recommendation]

## Validation Summary

| Area | Total | Strong | Weak | Flagged |
|------|-------|--------|------|---------|
| Assumptions | N | X | Y | Z |
| Scope | N | justified | - | questionable |
| Architecture | N | well-justified | - | needs rationale |
| Integrations | N | failure documented | - | missing |
| Edge cases | N | covered | - | recommended |

## Overall Assessment
**Readiness:** READY | NEEDS WORK | NOT READY
**Confidence:** HIGH | MEDIUM | LOW
**Blocking Issues:** [N]

Recommendation Validation

For each recommendation:

Verify the recommendation itself is sound (apply it mentally and check for new issues)
Cite evidence tier supporting the recommendation
If recommendation would create new assumptions, flag them

<FORBIDDEN>Proposing a "correction" that has not itself been validated. A wrong recommendation is worse than leaving the original assumption.</FORBIDDEN>

Cross-Category Contradiction Detection

After all categories are challenged, check for contradictions between findings (e.g., Architecture says "fail-safe" but Edge Cases says "data loss"). Report contradictions explicitly in the review output. Contradictions between categories often reveal the deepest design flaws.

Self-Check

<FINAL_EMPHASIS> Every passed assumption = production bug. Every vague requirement = scope creep. Every unexamined edge case = 3am incident. Thorough. Skeptical. Relentless. </FINAL_EMPHASIS>

Maintainer

axiomantic Core maintainer

Source details

Full Name: axiomantic/spellbook
Branch: main
Path in repo: skills/devils-advocate
License: MIT License
Topics: claude cli mcp mcp-server ai-coding developer-tools gemini-cli skills prompt-engineering llm python codex opencode ai-assistant spellbook

Featured Tools

Join Our Newsletter

Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.

5 2

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Evidence Hierarchy Reference

Invariant Principles

Applicability

Inputs

Outputs

Verdicts

Review Protocol

Challenge Categories

Challenge Template

Output Format

Recommendation Validation

Cross-Category Contradiction Detection

Self-Check

Recommended Agent Skills

spellbook-auditing

documentation-updates

project-encyclopedia

reviewing-impl-plans

session-resume

brainstorming