Hypothesis-Driven Debugging

Generate a structured debugging document that identifies candidate root causes and provides falsification plans for each. The output document instructs a separate execution agent; do not perform the investigation yourself.

Philosophical Foundation

Apply Popperian falsificationism: hypotheses cannot be proven true, only disproven. Design tests that could definitively rule out each hypothesis rather than confirm it. A good falsification test produces a clear negative result if the hypothesis is wrong.

Process

1. Gather Context

Before forming hypotheses, collect:

Symptom description: What behaviour is observed vs expected?
Reproduction conditions: When does it occur? Intermittent or consistent?
Recent changes: Deployments, configuration changes, dependency updates
Error artefacts: Stack traces, logs, error messages, screenshots
Environmental factors: OS, runtime versions, network conditions

If information is missing, note gaps in the output document.

2. Form Hypotheses

Generate 1–5 hypotheses ranked by plausibility. Each hypothesis must be:

Specific: Name the component, function, or interaction suspected
Falsifiable: A concrete test could disprove it
Independent: Falsifying one should not automatically falsify others

Common hypothesis categories:

Category	Examples
State	Race condition, stale cache, corrupted data
Input	Malformed payload, encoding issue, boundary case
Environment	Missing dependency, version mismatch, resource exhaustion
Logic	Off-by-one, incorrect predicate, missing null check
Integration	API contract violation, timeout, auth failure

Avoid vague hypotheses ("something wrong with the database"). Pin down the specific failure mode.

3. Design Falsification Plans

For each hypothesis, specify:

Prediction: If this hypothesis is correct, what observable outcome follows?
Falsification test: What action would produce a contradicting observation?
Expected negative result: What outcome would disprove the hypothesis?
Tooling required: Commands, scripts, or instrumentation needed
Confidence impact: How decisively would a negative result rule this out?

Prefer tests that are:

Quick to execute
Minimally invasive
Deterministic rather than probabilistic

4. Output Document

Generate a Markdown document following the template in assets/debugging-plan.md. Save to the working directory as debugging-plan-{timestamp}.md.

Quality Criteria

A well-formed debugging plan exhibits:

Mutual exclusivity: At least one hypothesis should survive if others fail
Collective exhaustiveness: Hypotheses cover the likely failure space
Ordered efficiency: Cheapest decisive tests appear first
Clear success criteria: The executing agent knows when to stop

Anti-Patterns

Confirmation bias: Designing tests that can only succeed, not fail
Hypothesis creep: Adding new hypotheses during execution rather than revision
Coupling: Tests that cannot isolate individual hypotheses
Vagueness: "Check the logs" without specifying what pattern would falsify

References

references/examples.md: Worked examples of hypothesis-falsification pairs across common debugging scenarios (API timeouts, flaky tests, memory leaks)

Search AI Tools

hypothesis-debugging

Install this agent skill to your Project

SKILL.md