Agent skills
product-analytics

Agent skill

product-analytics

A/B test evaluation, cohort retention analysis, funnel metrics, and experiment-driven product decisions. Use when analyzing experiments, measuring feature adoption, diagnosing conversion drop-offs, or evaluating statistical significance of product changes.

View SKILL.md on GitHub Repository

Stars 143

Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/yonatangross/orchestkit/tree/main/plugins/ork/skills/product-analytics

Metadata

Additional technical details for this skill

category: document-asset-creation

SKILL.md

Product Analytics

Frameworks for turning raw product data into ship/extend/kill decisions. Covers A/B testing, cohort retention, funnel analysis, and the statistical foundations needed to make those decisions with confidence.

Quick Reference

Category	Rules	Impact	When to Use
A/B Test Evaluation	1	HIGH	Comparing variants, measuring significance, shipping decisions
Cohort Retention	1	HIGH	Feature adoption curves, day-N retention, engagement scoring
Funnel Analysis	1	HIGH	Drop-off diagnosis, conversion optimization, stage mapping
Statistical Foundations	1	HIGH	p-value interpretation, sample sizing, confidence intervals

Total: 4 rules across 4 categories

A/B Test Evaluation

Load rules/ab-test-evaluation.md for the full framework. Quick pattern:

markdown

## Experiment: [Name]

Hypothesis: If we [change], then [primary metric] will [direction] by [amount]
  because [evidence or reasoning].

Sample size: [N per variant] — calculated for MDE=[X%], power=80%, alpha=0.05
Duration: [Minimum weeks] — never stop early (peeking bias)

Results:
  Control:   [metric value]  n=[count]
  Treatment: [metric value]  n=[count]
  Lift:      [+/- X%]        p=[value]  95% CI: [lower, upper]

Decision: SHIP / EXTEND / KILL
  Rationale: [One sentence grounded in numbers, not gut feel]

Decision rules:

SHIP — p < 0.05, CI excludes zero, no guardrail regressions
EXTEND — trending positive but underpowered (add runtime, not reanalysis)
KILL — null result or guardrail degradation

See rules/ab-test-evaluation.md for sample size formulas, SRM checks, and pitfall list.

Cohort Retention

Load rules/cohort-retention.md for full methodology. Quick pattern:

sql

-- Day-N retention cohort query
SELECT
  DATE_TRUNC('week', first_seen)  AS cohort_week,
  COUNT(DISTINCT user_id)         AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN activity_date = first_seen + INTERVAL '7 days'
    THEN user_id END) * 100.0
    / COUNT(DISTINCT user_id)     AS day_7_retention
FROM user_activity
GROUP BY 1
ORDER BY 1;

Retention benchmarks (SaaS):

Day 1: 40–60% is healthy
Day 7: 20–35% is healthy
Day 30: 10–20% is healthy
Flat curve after day 30 = product-market fit signal

See rules/cohort-retention.md for behavior-based cohorts, feature adoption curves, and engagement scoring.

Funnel Analysis

Load rules/funnel-analysis.md for full methodology. Quick pattern:

markdown

## Funnel: [Name] — [Date Range]

Stage 1: [Aware / Land]     → [N] users    (entry)
Stage 2: [Activate / Sign]  → [N] users    ([X]% from stage 1)
Stage 3: [Engage / Use]     → [N] users    ([X]% from stage 2)  ← biggest drop
Stage 4: [Convert / Pay]    → [N] users    ([X]% from stage 3)

Overall conversion: [X]%
Biggest drop-off:  Stage 2→3 ([X]% loss) — investigate first

Optimization order: Fix the largest drop-off first. A 5-point improvement at a high-volume step is worth more than a 20-point improvement at a low-volume step.

See rules/funnel-analysis.md for segmented funnels, micro-conversion tracking, and prioritization patterns.

Statistical Foundations

Plain-English explanations of the stats every PM needs. Load references/stats-cheat-sheet.md for formulas and quick lookups.

p-value in plain English: The probability that you would see a result this extreme (or more extreme) if the change had zero effect. p=0.03 means a 3% chance you're looking at random noise. It does NOT mean "97% probability the change works."

Confidence interval in plain English: The range where the true effect probably lives. "Lift = +8%, 95% CI [+2%, +14%]" means you are fairly confident the real lift is somewhere between 2% and 14%. If the CI includes zero, you cannot claim a win.

Minimum Detectable Effect (MDE): The smallest lift you care about detecting. Setting MDE too small forces impractically large sample sizes. Anchor MDE to business value — if a 2% lift is not worth shipping, set MDE = 5%.

Statistical vs practical significance: A result can be statistically significant (p < 0.05) but practically meaningless (lift = 0.01%). Always check both. A 0.01% lift that costs 6 weeks of eng time is not a win.

Common Pitfalls

Peeking — stopping an experiment early because results look good inflates false-positive rate. Commit to a runtime before launch.
Multiple comparisons — testing 10 metrics at p < 0.05 means ~1 false positive by chance. Apply Bonferroni correction or pre-register your primary metric.
Sample Ratio Mismatch (SRM) — if variant group sizes differ from expected split by > 1%, your experiment is broken. Fix before analyzing results.
Novelty effect — new features get inflated engagement in week 1. Run experiments long enough to see settled behavior (minimum 2 full business cycles).
Simpson's paradox — aggregate results can reverse when segmented. Always check results by key segments (device, plan tier, geography).

Ship / Extend / Kill Framework

Signal	Decision	Action
p < 0.05, CI excludes zero, guardrails green	SHIP	Full rollout, update success metrics
Positive trend, underpowered (p = 0.10–0.15)	EXTEND	Add runtime, do not peek again
p > 0.15, flat or negative	KILL	Revert, document learnings, re-hypothesize
Guardrail regression, any p-value	KILL	Immediate revert regardless of primary metric
SRM detected	INVALID	Fix assignment bug, restart experiment

Related Skills

ork:product-frameworks — OKRs, KPI trees, RICE prioritization, PRD templates
ork:metrics-instrumentation — Event naming, metric definition, alerting setup
ork:brainstorm — Generate hypotheses and experiment ideas
ork:assess — Evaluate product quality and risks

References

rules/ab-test-evaluation.md — Hypothesis, sample size, significance, decision matrix
rules/cohort-retention.md — Cohort types, retention curves, SQL patterns
rules/funnel-analysis.md — Stage mapping, drop-off identification, optimization
references/stats-cheat-sheet.md — Formulas, test selection, power analysis

Version: 1.0.0 (March 2026)

Maintainer

yonatangross Core maintainer

Source details

Full Name: yonatangross/orchestkit
Branch: main
Path in repo: plugins/ork/skills/product-analytics
License: MIT License
Topics: claude-code mcp typescript agents llm react ai-development security rag langgraph testing claude-plugin fastapi

Featured Tools

Join Our Newsletter

AI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.

143 15

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

Product Analytics

Quick Reference

A/B Test Evaluation

Cohort Retention

Funnel Analysis

Statistical Foundations

Common Pitfalls

Ship / Extend / Kill Framework

Related Skills

References

Recommended Agent Skills

expect

github-operations

chain-patterns

storybook-mcp-integration

component-search

ai-ui-generation