Agent skill

ab-test-analysis

Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.

View SKILL.md on GitHub Repository

Stars 9,823

Forks 1,082

Install this agent skill to your Project

npx add-skill https://github.com/phuryn/pm-skills/tree/main/pm-data-analytics/skills/ab-test-analysis

SKILL.md

A/B Test Analysis

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.

Context

You are analyzing A/B test results for $ARGUMENTS.

If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.

Instructions

Understand the experiment:
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
Validate the test setup:
- Sample size: Is the sample large enough for the expected effect size?
  - Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
  - Flag if the test is underpowered (<80% power)
- Duration: Did the test run for at least 1-2 full business cycles?
- Randomization: Any evidence of sample ratio mismatch (SRM)?
- Novelty/primacy effects: Was there enough time to wash out initial behavior changes?
Calculate statistical significance:
- Conversion rate for control and variant
- Relative lift: (variant - control) / control × 100
- p-value: Using a two-tailed z-test or chi-squared test
- Confidence interval: 95% CI for the difference
- Statistical significance: Is p < 0.05?
- Practical significance: Is the lift meaningful for the business?
If the user provides raw data, generate and run a Python script to calculate these.
Check guardrail metrics:
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
- A winning primary metric with degraded guardrails may not be a true win

Interpret results:

Outcome	Recommendation
Significant positive lift, no guardrail issues	Ship it — roll out to 100%
Significant positive lift, guardrail concerns	Investigate — understand trade-offs before shipping
Not significant, positive trend	Extend the test — need more data or larger effect
Not significant, flat	Stop the test — no meaningful difference detected
Significant negative lift	Don't ship — revert to control, analyze why

Provide the analysis summary:

## A/B Test Results: [Test Name]

**Hypothesis**: [What we expected]
**Duration**: [X days] | **Sample**: [N control / M variant]

| Metric | Control | Variant | Lift | p-value | Significant? |
|---|---|---|---|---|---|
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
| [Guardrail] | ... | ... | ... | ... | ... |

**Recommendation**: [Ship / Extend / Stop / Investigate]
**Reasoning**: [Why]
**Next steps**: [What to do]

Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.

Maintainer

phuryn Core maintainer

Source details

Full Name: phuryn/pm-skills
Branch: main
Path in repo: pm-data-analytics/skills/ab-test-analysis
License: MIT License
Topics: agent-skills agentic-skills claude-code-marketplace claude-code-plugins product-management agent-skill-repository claude-cowork-plugin

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

phuryn/pm-skills

cohort-analysis

Perform cohort analysis on user engagement data — retention curves, feature adoption trends, and segment-level insights. Use when analyzing user retention by cohort, studying feature adoption over time, investigating churn patterns, or identifying engagement trends.

9,823 1,082

Explore

phuryn/pm-skills

sql-queries

Generate SQL queries from natural language descriptions. Supports BigQuery, PostgreSQL, MySQL, and other dialects. Reads database schemas from uploaded diagrams or documentation. Use when writing SQL, building data reports, exploring databases, or translating business questions into queries.

9,823 1,082

Explore

phuryn/pm-skills

swot-analysis

Perform a detailed SWOT analysis — strengths, weaknesses, opportunities, and threats with actionable recommendations. Use when doing strategic assessment, competitive analysis, or evaluating a product or business position.

9,823 1,082

Explore

phuryn/pm-skills

product-strategy

Create a comprehensive product strategy using the 9-section Product Strategy Canvas — vision, segments, costs, value propositions, trade-offs, metrics, growth, capabilities, and defensibility. Use when building a product strategy, creating a strategic plan, or defining product direction.

9,823 1,082

Explore

phuryn/pm-skills

pricing-strategy

Analyze and design pricing strategies including pricing models, competitive pricing analysis, willingness-to-pay estimation, and price elasticity. Use when setting prices, evaluating pricing models, preparing for a pricing change, or comparing freemium vs paid approaches.

9,823 1,082

Explore

phuryn/pm-skills

pestle-analysis

Perform a PESTLE analysis covering Political, Economic, Social, Technological, Legal, and Environmental factors. Use when assessing the macro environment, doing strategic planning, or evaluating external factors affecting your business.

9,823 1,082

Explore

Didn't find tool you were looking for?

Search AI Tools

ab-test-analysis

Install this agent skill to your Project

SKILL.md

A/B Test Analysis

Context

Instructions

Further Reading

Recommended Agent Skills

cohort-analysis

sql-queries

swot-analysis

product-strategy

pricing-strategy

pestle-analysis