Agent skill

analyze-results

Analyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says "analyze results", "compare", or needs to interpret experimental data.

Stars 6,306
Forks 582

Install this agent skill to your Project

npx add-skill https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep/tree/main/skills/analyze-results

SKILL.md

Analyze Experiment Results

Analyze: $ARGUMENTS

Workflow

Step 1: Locate Results

Find all relevant JSON/CSV result files:

  • Check figures/, results/, or project-specific output directories
  • Parse JSON results into structured data

Step 2: Build Comparison Table

Organize results by:

  • Independent variables: model type, hyperparameters, data config
  • Dependent variables: primary metric (e.g., perplexity, accuracy, loss), secondary metrics
  • Delta vs baseline: always compute relative improvement

Step 3: Statistical Analysis

  • If multiple seeds: report mean +/- std, check reproducibility
  • If sweeping a parameter: identify trends (monotonic, U-shaped, plateau)
  • Flag outliers or suspicious results

Step 4: Generate Insights

For each finding, structure as:

  1. Observation: what the data shows (with numbers)
  2. Interpretation: why this might be happening
  3. Implication: what this means for the research question
  4. Next step: what experiment would test the interpretation

Step 5: Update Documentation

If findings are significant:

  • Propose updates to project notes or experiment reports
  • Draft a concise finding statement (1-2 sentences)

Output Format

Always include:

  1. Raw data table
  2. Key findings (numbered, concise)
  3. Suggested next experiments (if any)

Expand your agent's capabilities with these related and highly-rated skills.

wanshuiyin/Auto-claude-code-research-in-sleep

ablation-planner

Use when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

paper-plan

Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

idea-discovery-robot

Workflow 1 adaptation for robotics and embodied AI. Orchestrates robotics-aware literature survey, idea generation, novelty check, and critical review to go from a broad robotics direction to benchmark-grounded, simulation-first ideas. Use when user says "robotics idea discovery", "机器人找idea", "embodied AI idea", "机器人方向探索", "sim2real 选题", or wants ideas for manipulation, locomotion, navigation, drones, humanoids, or general robot learning.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

training-check

Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

paper-plan

Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.

6,306 582
Explore
wanshuiyin/Auto-claude-code-research-in-sleep

idea-discovery-robot

Workflow 1 adaptation for robotics and embodied AI. Orchestrates robotics-aware literature survey, idea generation, novelty check, and critical review to go from a broad robotics direction to benchmark-grounded, simulation-first ideas. Use when user says \"robotics idea discovery\", \"机器人找idea\", \"embodied AI idea\", \"机器人方向探索\", \"sim2real 选题\", or wants ideas for manipulation, locomotion, navigation, drones, humanoids, or general robot learning.

6,306 582
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results