Agent skill

hypothesis-debugging

Structured code debugging through hypothesis formation and falsification planning. Use when diagnosing bugs, unexpected behaviour, or system failures where the root cause is unclear. Produces a hypothesis document for execution by another agent rather than performing the investigation directly. Triggers on requests to debug issues, diagnose problems, investigate failures, or create debugging plans.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/hypothesis-debugging

SKILL.md

Hypothesis-Driven Debugging

Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.

Philosophical Foundation

Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.

Process

1. Gather Context

Before forming hypotheses, collect:

  • Symptom description: What behaviour is observed vs expected?
  • Reproduction conditions: When does it occur? Intermittent or consistent?
  • Recent changes: Deployments, configuration changes, dependency updates
  • Error artefacts: Stack traces, logs, error messages, screenshots
  • Environmental factors: OS, runtime versions, network conditions

If information is missing, note gaps in the output document.

2. Form Hypotheses

Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:

  • Specific: Name the component, function, or interaction suspected
  • Falsifiable: A concrete test could disprove it
  • Independent: Falsifying one should not automatically falsify others

Common hypothesis categories:

Category Examples
State Race condition, stale cache, corrupted data
Input Malformed payload, encoding issue, boundary case
Environment Missing dependency, version mismatch, resource exhaustion
Logic Off-by-one, incorrect predicate, missing null check
Integration API contract violation, timeout, auth failure

Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.

3. Design Falsification Plans

For each hypothesis, specify:

  1. Prediction: If this hypothesis is correct, what observable outcome follows?
  2. Falsification test: What action would produce a contradicting observation?
  3. Expected negative result: What outcome would disprove the hypothesis?
  4. Tooling required: Commands, scripts, or instrumentation needed
  5. Confidence impact: How decisively would a negative result rule this out?

Prefer tests that are:

  • Quick to execute
  • Minimally invasive
  • Deterministic rather than probabilistic

4. Output Document

Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.

Quality Criteria

A well-formed debugging plan exhibits:

  • Mutual exclusivity: At least one hypothesis should survive if others fail
  • Collective exhaustiveness: Hypotheses cover the likely failure space
  • Ordered efficiency: Cheapest decisive tests appear first
  • Clear success criteria: The executing agent knows when to stop

Anti-Patterns

  • Confirmation bias: Designing tests that can only succeed, not fail
  • Hypothesis creep: Adding new hypotheses during execution rather than revision
  • Coupling: Tests that cannot isolate individual hypotheses
  • Vagueness: "Check the logs" without specifying what pattern would falsify

References

  • references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)

Didn't find tool you were looking for?

Be as detailed as possible for better results