Agent skill

experiment-craft

Use this skill when the user wants to debug, diagnose, or systematically iterate on an experiment that already exists, or when they need a structured experiment log for tracking runs, hypotheses, failures, results, and next steps during active research. Apply it to underperforming methods, training that will not converge, regressions after a change, inconsistent results across datasets, aimless experimentation without progress, and questions like 'why doesn't this work?', 'no progress after many attempts', or 'how should I investigate this failure?'. Also use it for setting up practical experiment logging/record-keeping that supports debugging and iteration. Do not use it for designing a brand-new experiment pipeline or full experiment program (use experiment-pipeline), generating research ideas, fixing isolated coding/syntax errors, or writing retrospective summaries into research memory/notes/knowledge bases.

View SKILL.md on GitHub Repository

Stars 141

Forks 17

Install this agent skill to your Project

npx add-skill https://github.com/EvoScientist/EvoSkills/tree/main/skills/experiment-craft

Metadata

Additional technical details for this skill

tags: core experimentation experiment-design
author: EvoScientist
version: 1.0.0

SKILL.md

Experiment Craft

A systematic approach to running, debugging, and iterating on research experiments. The critical skill is not running more experiments — it's understanding WHY experiments fail.

When to Use This Skill

User's experiment is not working or producing unexpected results
User needs help diagnosing why a method fails on certain data
User wants to organize their experiment process with structured logging
User asks about debugging research code or iterating on approaches
User mentions "experiment debugging", "why doesn't this work", "experiment log", "results are wrong"

This skill is typically loaded from within experiment-pipeline when a stage attempt fails. After debugging, return to the pipeline's stage-gate structure to continue. Can also be used standalone for any experiment debugging.

The Debugging Mindset

Finding WHY experiments fail is the most critical research skill. Not analyzing results leads to two failure modes:

Slow progress: Running random experiments without understanding failure causes
Wasted time: Abandoning good approaches because activation tricks were missed

The goal is not to run more experiments. The goal is to run the RIGHT experiments — ones that isolate causes and test specific hypotheses.

5-Step Diagnostic Flow

When an experiment fails or produces unexpected results, follow these five steps:

Step 1: Collect Failure Cases

Gather concrete examples of bad results. Look at the actual outputs, not just aggregate metrics. What specifically went wrong? Are the failures systematic or random?

Step 2: Find a Working Version

You need a baseline that works. Two ways to find one:

Simplify the task: Reduce data complexity, relax the task setting, add more supervision, use easier inputs
Remove your changes: Start from the baseline method and remove your algorithmic improvements one by one

If you can't find any working version, simplify further until something works. There is always a simple enough version that works.

Step 3: Bridge the Gap

Starting from the working version, incrementally add complexity until it breaks:

Add ONE factor at a time (more complex data, one algorithmic change, one constraint)
Find the single factor that causes failure
The more atomic the identified cause, the more useful the diagnosis

This step isolates the cause. Without it, you're guessing.

Step 4: Hypothesize and Verify

Based on the isolated cause from Step 3:

List possible explanations for why this factor causes failure
Rank by likelihood (based on your understanding and literature)
Design targeted experiments to verify or eliminate each hypothesis
Confirm the actual cause experimentally — don't rely on intuition alone

Step 5: Propose and Implement a Fix

Based on the confirmed cause:

Search for techniques that address this specific cause (use your literature tree from the research-ideation skill)
Design a fix that targets the confirmed cause, not the surface symptom
Verify the fix works on the original failure cases
Check that the fix doesn't break previously working cases

See references/debugging-methodology.md for detailed branching logic and a cause taxonomy.

Counterintuitive Experiment Rules

Prioritize these rules during experimental work:

Change only one variable at a time: If you change two things and it works, you don't know which one fixed it. If you change two things and it doesn't work, you don't know which one is wrong. Single-variable changes are slower per experiment but faster overall.
Fast iteration requires effective experiments, not more experiments: Blind experimentation makes things worse. One well-designed diagnostic experiment is worth ten random trials.
Some great techniques don't work alone: They need specific activation tricks — learning rate schedules, initialization schemes, data preprocessing steps. Don't discard a technique after one failed attempt. Check related papers for their undisclosed tricks.
Check related papers for their tricks: Papers solving similar technical challenges often have critical implementation details buried in supplementary material or code. These tricks can make the difference between a technique working or failing.
"Once you've ruled out the impossible, whatever remains must be true": Systematic elimination beats intuition. When debugging, explicitly list ALL possible causes, then eliminate them one by one with targeted experiments.

Experiment Logging

Every experiment should be logged with five sections. Use the template at assets/experiment-log-template.md.

Section	What to Record
Purpose	Why you're running this experiment; what you expect to learn
Setting	Data, algorithm changes, hyperparameters — everything needed to reproduce
Results	Quantitative metrics + qualitative observations + specific good/failure cases
Analysis	Do results match expectations? If not, hypothesized causes ranked by likelihood
Next Steps	What to do based on the analysis — YOU are the project leader

The "Next Steps" section is the most important. Don't wait for someone to tell you what to do next. Analyze your results and propose the next experiment yourself. This is what distinguishes a researcher from a technician.

Cross-cycle learning: If using experiment-pipeline, your experiment logs feed into evo-memory's ESE (Experiment Strategy Evolution) mechanism. Tag reusable strategies with [Reusable] so ESE can extract them for future cycles.

Return to experiment-pipeline

After completing the 5-step diagnostic flow, return to experiment-pipeline with:

Confirmed cause of failure (from Step 4)
Proposed fix and its verification status (from Step 5)
Updated experiment log entry

Handoff to Paper Writing

When experiments succeed and you have a complete set of results, pass these artifacts to paper-writing:

Artifact	Source	Used By
Final experiment results (tables and figures)	Experiment logs	Experiments section
Ablation study results	Diagnostic experiments	Ablation tables
Failure case analysis	Step 1 + Step 3	Limitations discussion
Key implementation details and tricks	Steps 3-5	Method section / Supplementary
Baseline comparison results	Step 2	Comparison tables

Reference Navigation

Topic	Reference File	When to Use
Debugging methodology	debugging-methodology.md	Diagnosing why experiments fail
Experiment log template	experiment-log-template.md	Recording experiment details

Maintainer

EvoScientist Core maintainer

Source details

Full Name: EvoScientist/EvoSkills
Branch: main
Path in repo: skills/experiment-craft
License: Apache License 2.0
Topics: skills ai-agent ai4science vibe-research

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

EvoScientist/EvoSkills

paper-writing

Guides writing academic papers section by section using an 11-step workflow with LaTeX templates and counterintuitive writing tactics. Covers Abstract, Introduction, Method, Experiments, Related Work, Conclusion, and Supplementary. Use when: user asks to write or draft a paper section, needs LaTeX templates, wants to improve academic writing quality, optimize novelty framing, or mentions 'write introduction', 'draft method', 'paper writing'. Do NOT use for pre-submission review (use paper-review), experiment execution (use experiment-pipeline), or paper planning/story design (use paper-planning).

141 17

Explore

EvoScientist/EvoSkills

evo-memory

Manages persistent research memory across ideation and experimentation cycles. Maintains two stores: Ideation Memory M_I (feasible/unsuccessful directions) and Experimentation Memory M_E (reusable strategies for data processing, model training, architecture, debugging). Three evolution mechanisms: IDE (after idea-tournament), IVE (after experiment failure — classifies failures as implementation vs fundamental), ESE (after experiment success — extracts reusable strategies). Use when: updating memory after completing idea tournaments or experiment pipelines, classifying why a method failed (implementation vs fundamental failure), starting a new research cycle needing prior knowledge, user mentions 'update memory', 'classify failure', 'what worked before', 'research history', 'evolution'. Do NOT use for running experiments (use experiment-pipeline), debugging experiment code (use experiment-craft), or generating ideas (use idea-tournament).

141 17

Explore

EvoScientist/EvoSkills

paper-navigator

End-to-end academic paper workflow: disambiguate queries, discover papers (search, citation traversal, recommendations, arXiv monitoring, trending, GitHub search), evaluate (TLDR, citations, code, SOTA), read with structured analysis (3-level strategy), and organize into literature maps or reports. Use when: finding papers, reading a paper, related work, literature survey, citation analysis, research trends, SOTA results, datasets, or literature reports. Do NOT use for writing a literature review section (use paper-writing), comparing research ideas (use idea-tournament), or planning paper structure (use paper-planning).

141 17

Explore

EvoScientist/EvoSkills

paper-review

Guides self-review of YOUR OWN academic paper before submission with adversarial stress-testing. Core method: 5-aspect checklist (contribution sufficiency, writing clarity, results quality, testing completeness, method design), counterintuitive protocol (reject-first simulation, delete unsupported claims, score trust, promote limitations, attack novelty), reverse-outlining, and figure/table quality checks. Use when: user wants to self-review or self-check their own paper draft before submission, stress-test their claims, prepare for reviewer criticism, or mentions 'self-review', 'check my draft', 'is my paper ready'. Do NOT use for writing a peer review of someone else's paper, and do NOT use after receiving actual reviews (use paper-rebuttal instead).

141 17

Explore

EvoScientist/EvoSkills

experiment-pipeline

Guides structured 4-stage experiment execution with attempt budgets and gate conditions: Stage 1 initial implementation (reproduce baseline), Stage 2 hyperparameter tuning, Stage 3 proposed method validation, Stage 4 ablation study. Integrates with evo-memory (load prior strategies, trigger IVE/ESE) and experiment-craft (5-step diagnostic on failure). Use when: user has a planned experiment, needs to reproduce baselines, organize experiment workflow, or systematically validate a method. Do NOT use for debugging a specific experiment failure (use experiment-craft) or designing which experiments to run (use paper-planning).

141 17

Explore

EvoScientist/EvoSkills

academic-slides

Use this skill for creating or refining an academic slide deck and the talk built around it: structuring a conference talk, thesis defense, lab meeting, or paper-to-slides deck; deciding the narrative arc and slide breakdown; improving slide design and visual hierarchy; planning rehearsal, timing, Q&A, and backup slides; or generating the .pptx. Reach for it when the user is shaping the presentation itself. Do not use for writing the paper, producing standalone speaker notes/scripts/transcripts, making posters, creating isolated figures/charts outside a slide deck, or building non-academic presentations.

141 17

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

Experiment Craft

When to Use This Skill

The Debugging Mindset

5-Step Diagnostic Flow

Step 1: Collect Failure Cases

Step 2: Find a Working Version

Step 3: Bridge the Gap

Step 4: Hypothesize and Verify

Step 5: Propose and Implement a Fix

Counterintuitive Experiment Rules

Experiment Logging

Return to experiment-pipeline

Handoff to Paper Writing

Reference Navigation

Recommended Agent Skills

paper-writing

evo-memory

paper-navigator

paper-review

experiment-pipeline

academic-slides