Agent skill

isolated-testing

Use when testing theories during debugging, or when chaos is detected. Triggers: "let me try", "maybe if I", "what about", "quick test", "see if", rapid context switching, multiple changes without isolation. Enforces one-theory-one-test discipline. Invoked automatically by debugging, scientific-debugging, systematic-debugging before any experiment execution.

Stars 5
Forks 2

Install this agent skill to your Project

npx add-skill https://github.com/axiomantic/spellbook/tree/main/skills/isolated-testing

SKILL.md

Isolated Testing

Uncertainty is not uncomfortable. Uncertainty is the natural state before knowledge. You do not rush to escape it. You sit with it, design a proper test, and let evidence speak.

This discipline is critical to my career. </ROLE>

You are here because you have theories to test. Not to thrash. Not to "try things." To TEST, methodically.

Invariant Principles

  1. One Theory, One Test, Full Stop. Test a single theory completely before considering another. No mixing. No "while I'm here."
  2. Design Before Execute. Write the repro test encompassing every step. Get approval (unless autonomous). THEN run.
  3. Stop on Reproduction. Bug repros = STOP investigating. Announce. Wait (unless autonomous, then proceed to fix phase).
  4. Uncertainty is Not Urgency. The pressure to "do something" is the enemy. Deliberation resolves uncertainty.
  5. Evidence is Binary. Repro or no-repro. Proved or disproved. No "partially confirmed."
  6. Know Your Code State. Before EVERY test: verify baseline, what modifications exist, what state you intend to test.
  7. Queue Discipline. Test theories in order. No skipping. No adding new theories mid-test.

The Protocol

Step 0: Verify Code State

CODE STATE CHECK:
- Baseline: [commit SHA / version / description]
- Current state: [clean / modified]
- Modifications: [none / list what's changed]
- Intended test state: [clean baseline / with modification X]

If you don't know your code state, STOP. Return to clean baseline before proceeding. </CRITICAL>

Step 1: Select ONE Theory

Select the FIRST untested theory. Not "the one I feel good about." The FIRST one.

THEORY QUEUE:
1. [Theory 1] - Status: [UNTESTED/TESTING/DISPROVED/CONFIRMED]
2. [Theory 2] - Status: UNTESTED
3. [Theory 3] - Status: UNTESTED

Currently testing: Theory [N]: [description]
Status: UNTESTED -> TESTING

Step 2: Design the Repro Test

The test must:

  • Encompass EVERY step needed to reproduce (not "run tests" but the specific command)
  • Have a CLEAR expected outcome if theory is correct
  • Have a CLEAR expected outcome if theory is wrong
  • Be RUNNABLE as written (no placeholders)

If you cannot design a runnable test for this theory, mark it UNTESTABLE and advance to the next theory. </CRITICAL>

Template:

markdown
## Repro Test for Theory [N]

**Theory:** [exact claim being tested]

**Test procedure:**
1. [Exact step 1]
2. [Exact step 2]
3. [Exact step N]

**If theory is CORRECT, I will see:**
[Specific observable outcome]

**If theory is WRONG, I will see:**
[Specific observable outcome]

**Command to run:**
```bash
[exact command]

### Step 3: Approval Gate

<RULE>
**Non-autonomous mode:** Present the repro test design. Ask user: "May I execute?" with options: run as designed, adjust first, or skip theory.

**Autonomous mode (YOLO):** Announce intent, proceed without waiting.
</RULE>

### Step 4: Execute ONCE

Run the test EXACTLY as designed. Once. Not twice "to be sure." Once. Capture output and compare to expected outcomes.

### Step 5: Verdict

| Outcome | Verdict | Next Action |
|---------|---------|-------------|
| Matches "correct" prediction | **REPRODUCED** | STOP investigating. Announce. Proceed to fix (if autonomous) or wait. |
| Matches "wrong" prediction | **DISPROVED** | Mark theory DISPROVED. Return to Step 1 with next theory. |
| Neither matches | **INCONCLUSIVE** | If one more test iteration can resolve it, design refined test. Otherwise mark INCONCLUSIVE and advance. |

### Step 6: On Reproduction

<CRITICAL>
When a test reproduces the bug:

**FULL STOP.**

BUG REPRODUCED under Theory [N].

Theory: [description] Evidence: [what the test showed]

Investigation complete. Ready for fix phase.


Before proceeding to fix phase, invoke `verifying-hunches` to confirm the theory holds before committing to a fix.

**Non-autonomous:** Wait for user before proceeding to fix.
**Autonomous:** Invoke `verifying-hunches`, then proceed to fix phase (invoke `test-driven-development` skill).

DO NOT:
- "Confirm" with another test
- Investigate "why" further
- Check other theories "just to be thorough"
- Make any changes without explicit fix-phase transition
</CRITICAL>

---

## Chaos Detection

<FORBIDDEN>
If you catch yourself doing ANY of these, STOP IMMEDIATELY and return to Step 0:

**Code state violations:**
- Testing without knowing your code state
- Forgetting what modifications you've made
- Assuming clean baseline without verifying
- Making changes without recording them

**Action without design:**
- "Let me try..." (try WHAT? designed HOW?)
- "Maybe if I..." (hypothesis, not test)
- "What about..." (brainstorming, not testing)
- "Quick test..." (no such thing)
- "See if..." (prediction unclear)

**Queue violations:**
- Skipping theories to test "the likely one"
- Adding new theories mid-test without completing current
- Jumping between theories without marking status
- Testing theory 3 before theories 1 and 2 are resolved

**Mixing theories:**
- Changing multiple things between tests
- "While I'm here, also..."
- Testing theory A but making change related to theory B
- Running multiple experiments without isolation

**Premature action:**
- Running before design is written
- Running before approval (non-autonomous)
- Making changes instead of observing
- "Fixing" before reproduction confirmed
- Elaborate fix attempts before proving bug exists

**Continuation after reproduction:**
- "Let me verify that's really it"
- "I'll also check theory B just in case"
- Any action after bug repros except announcing and waiting
</FORBIDDEN>

---

## Theory Tracker

Maintain explicit state:

Theory Status

# Theory Status Test Result
1 [desc] DISPROVED [what test showed]
2 [desc] TESTING -
3 [desc] UNTESTED -

Update after each test. Survives context compaction.

---

## Integration Points

**Invoked by:** `debugging` (Phase 3), `scientific-debugging` (experiment execution), `systematic-debugging` (Phase 3)

**Invokes:** `verifying-hunches` (Step 6, before fix phase), `test-driven-development` (fix phase entry)

---

## Self-Check

Before EACH test execution:
- [ ] Single theory selected and stated
- [ ] Repro test fully designed (procedure, predictions, command)
- [ ] Approval obtained (or autonomous mode)
- [ ] Previous theories properly marked

After EACH test:
- [ ] Theory status updated (DISPROVED/REPRODUCED/INCONCLUSIVE)
- [ ] If REPRODUCED: stopped investigating, announced, waiting
- [ ] If DISPROVED: moved to next theory, not re-testing same one
- [ ] No mixing, no "trying," no chaos

---

<FINAL_EMPHASIS>
Patience is not passivity. Deliberation is not delay. The disciplined tester finds truth faster than the frantic one.

One theory. One test. Full stop.

This is very important to my career.
</FINAL_EMPHASIS>

Expand your agent's capabilities with these related and highly-rated skills.

axiomantic/spellbook

spellbook-auditing

Meta-audit skill for spellbook development. Spawns parallel subagents to factcheck docs, optimize instructions, find token savings, and identify MCP candidates. Produces actionable report.

5 2
Explore
axiomantic/spellbook

documentation-updates

Use after modifying library skills, library commands, or agents to ensure CHANGELOG, README, and docs are updated

5 2
Explore
axiomantic/spellbook

project-encyclopedia

[DEPRECATED] Use project-level AGENTS.md files instead. Previously used for first-session codebase onboarding and persistent glossary creation.

5 2
Explore
axiomantic/spellbook

reviewing-impl-plans

Use when reviewing implementation plans before execution. Triggers: 'is this plan solid', 'review the plan', 'check before I start building', 'anything missing from this plan', 'will this plan work', 'audit the implementation plan'. NOT for: reviewing design documents (use reviewing-design-docs) or creating plans (use writing-plans).

5 2
Explore
axiomantic/spellbook

session-resume

Session resume protocol and session repairs handling. Loaded when spellbook_session_init returns resume_available: true, or when session_init returns a repairs array. Triggers: 'resume', 'continue', 'where were we', session resume, session repairs.

5 2
Explore
axiomantic/spellbook

brainstorming

Use when exploring design approaches, generating ideas, or making architectural decisions. Triggers: 'explore options', 'what are the tradeoffs', 'how should I approach', 'let's think through', 'sketch out an approach', 'I need ideas for', 'how would you structure', 'what are my options'. Also invoked by develop when design decisions are needed.

5 2
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results