Agent skill
test-driven
Test-Driven Development (TDD) - design tests from requirements, then execute RED -> GREEN -> REFACTOR cycle. Use when implementing features or fixes with TDD methodology, writing tests before code, or following XP-style development across any supported language.
Install this agent skill to your Project
npx add-skill https://github.com/OutlineDriven/odin-claude-plugin/tree/main/skills/test-driven
SKILL.md
Test-driven development (XP-style)
Tests define the specification. Design them from requirements before any implementation. The RED-GREEN-REFACTOR cycle is the heartbeat: write a failing test, make it pass with minimal code, then clean up while green.
Modern insight (2025): TDD + property-based testing pairing is the standard -- example tests prevent regressions, property tests discover edge cases. TDD also serves AI-assisted development: structural integrity keeps code understandable for both human and AI collaborators (Kent Beck, "Augmented Coding"). Mutation testing validates test quality beyond coverage metrics (TDD+Mutation: 63.3% vs TDD-alone: 39.4% mutation coverage).
See frameworks for language-specific test runners, property testing, coverage, and mutation tools. See examples for brief TDD cycle patterns per language.
When to Apply
- New features with clear requirements (both inside-out and outside-in approaches valid)
- Bug fixes -- write a failing test that proves the bug before fixing
- Refactoring -- ensure coverage exists before restructuring
- API contract enforcement -- test the interface, not internals
- Property-based invariants -- complement example tests with PBT
- Legacy code -- add characterization tests before modifying (Michael Feathers pattern)
When NOT to Apply
- Exploratory prototyping or spike research
- One-off scripts, data migrations, generated code
- Purely visual UI layout work (prefer visual regression testing)
- Highly experimental algorithmic research (but PBT still helps)
- Throwaway code with <1 week lifespan
Anti-patterns
- Test-last: Writing tests after implementation defeats the design benefit
- Testing implementation details: Tests should verify behavior, not internal structure -- breaks refactoring confidence
- Over-mocking: Testing the mocks instead of the code; mock external I/O, not core logic
- Skipping RED: Tests that never fail aren't tests -- they verify nothing
- 100% coverage obsession: Coverage does not equal quality. Mutation testing exposes gaps coverage cannot
- Refactoring on RED: Never restructure with failing tests
- Test-induced architectural damage: Letting mock boundaries dictate design
- Snapshot bloat: Approval-style tests without curation become maintenance burden
Two Schools (decision guidance, not prescription)
- Inside-Out (Classic/Detroit): Start with unit tests for smallest pieces, build upward. Minimizes mocks. Best for well-understood domains, algorithms, utility functions.
- Outside-In (London/Mockist): Start with acceptance test for user-facing behavior, use mocks to discover interfaces. Best for layered systems, APIs, microservices.
- Pragmatic teams use both depending on context. Neither is superior.
Test Doubles Hierarchy
- Stubs: Return predefined data; verify outcomes (state-based)
- Mocks: Verify interactions/calls were made (behavior-based)
- Fakes: Working implementations (e.g., in-memory database)
- Spies: Record calls while using real behavior
- Rule: Mock external dependencies. Never mock core domain logic.
Workflow (language-neutral)
- CREATE -- Write failing tests: error cases -> edge cases -> happy paths -> property tests
- RED -- Run tests, verify all fail. If any pass, the test is wrong or behavior already exists.
- GREEN -- Minimal code to pass. No extras, no optimization, no cleanup.
- REFACTOR -- Clean up while green. Separate structural changes from behavioral (Tidy First). Re-run tests after every change.
Constitutional Rules (Non-Negotiable)
- Design Tests First: Plan all test cases from requirements before implementation; write each test iteratively in the RED-GREEN-REFACTOR loop
- RED Before GREEN: Each new test MUST fail before you write implementation for it
- Error Cases First: Implement error handling before success paths
- One Test at a Time: Write one failing test, make it pass, refactor, then add the next test
- Refactor Only on GREEN: Never refactor with failing tests
Validation Gates
| Gate | Pass Criteria | Blocking |
|---|---|---|
| Tests Created | Test files exist for target module | Yes |
| RED State | All new tests fail before implementation | Yes |
| GREEN State | All tests pass after implementation | Yes |
| Coverage | >= 80% line coverage | No |
| Mutation | Mutation score reviewed (no threshold enforced) | No |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | TDD cycle complete, all tests pass |
| 11 | No test framework detected |
| 12 | Test compilation failed |
| 13 | Tests not failing (RED state invalid) |
| 14 | Tests fail after implementation (GREEN not achieved) |
| 15 | Tests fail after refactor (regression) |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
refactor-break-bw-compat
Refactor by removing backward compatibility and legacy layers. Use when modernizing APIs, cleaning up migration debt, removing compat shims, or eliminating stale feature flags.
pr-merge-temporal
Merge multiple PRs into a temporal integration branch before merging to base, with ordered conflict resolution. Use when you want to validate a set of PRs together on a staging branch before advancing the base branch.
tests-adversarial
Write adversarial tests that intentionally stress failure paths. Use when hardening error handling, stress-testing assumptions, validating boundary behavior, or hunting silent failures.
srgn-cli
Practical guide for building safe, syntax-aware srgn CLI commands for source-code search and transformation. Use when users ask for srgn commands, scoped refactors (comments/docstrings/imports/functions), multi-file rewrites with --glob, custom tree-sitter query usage, or CI-style checks with --fail-any/--fail-none.
askme
Verbalized Sampling (VS) protocol for deep intent exploration before planning. Use when starting ambiguous or complex tasks, when multiple interpretations exist, or when you need to explore diverse intent hypotheses and ask maximum clarifying questions before committing to an approach.
pr-merge-base
Merge one or more PRs into the base branch with queue-like sequencing and conflict resolution. Use when merging PRs that may conflict with each other or the base, requiring ordered application and intelligent conflict handling.
Didn't find tool you were looking for?