Agent skill

saas-launch-checklist

Pre-launch verification across infrastructure, security, legal, payment, email, analytics, and performance. Day-1 monitoring, rollback plan, incident response skeleton, and post-launch week-1 checklist.

Stars 458
Forks 38

Install this agent skill to your Project

npx add-skill https://github.com/vibeeval/vibecosystem/tree/main/skills/saas-launch-checklist

SKILL.md

SaaS Launch Checklist

A structured checklist for shipping a SaaS product to production with confidence. Every item exists because someone skipped it and regretted it.

Pre-Launch Checklist (28 Items)

Infrastructure (5 items)

  • SSL/TLS -- Certificate installed, auto-renewal configured (Let's Encrypt / ACM), HSTS header present
  • CDN -- Static assets served through CDN, cache-control headers set, origin shielding enabled
  • Backup -- Database backup scheduled (daily minimum), tested restore procedure, off-site copy
  • Monitoring -- Health check endpoint returns 200 when healthy and 503 when degraded, uptime monitor pings every 60s
  • Alerting -- PagerDuty/Opsgenie configured, on-call rotation set, escalation policy defined (5 min ack SLA)

Security (5 items)

  • Penetration test -- Run OWASP ZAP or equivalent against staging, fix all HIGH/CRITICAL findings
  • Dependency audit -- npm audit / pip-audit / trivy shows zero critical CVEs
  • Secret rotation -- All production secrets differ from staging, rotation schedule documented
  • Rate limiting -- API endpoints rate-limited (e.g., 100 req/min per user), auth endpoints stricter (10 req/min)
  • Headers -- CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy all configured

Legal (4 items)

  • Terms of Service -- Reviewed by legal counsel, published at /terms, acceptance recorded at signup
  • Privacy Policy -- GDPR/CCPA compliant, published at /privacy, covers data collection, retention, and third parties
  • Cookie consent -- Banner shown to EU visitors, non-essential cookies blocked until consent, preferences stored
  • DPA (Data Processing Agreement) -- Template available for enterprise customers, sub-processor list published

Payment (4 items)

  • Test transactions -- Full purchase flow tested with test cards (success, decline, 3DS), subscription upgrade/downgrade verified
  • Webhook verification -- Stripe/Paddle webhook signature verified, replay attacks prevented with idempotency keys
  • Invoice generation -- Invoices auto-generated with correct tax info, downloadable as PDF, emailed on payment
  • Dunning flow -- Failed payment retry schedule configured (day 1, 3, 5, 7), grace period before cancellation

Email (4 items)

  • SPF/DKIM/DMARC -- DNS records configured, DMARC policy set to quarantine or reject, verified with mail-tester.com
  • Transactional templates -- Welcome, password reset, invoice, and trial expiry emails tested across Gmail/Outlook/Apple Mail
  • Unsubscribe -- One-click unsubscribe header present (RFC 8058), /unsubscribe endpoint works, preference center available
  • Sender reputation -- Dedicated IP warmed up (if >10k emails/day), bounce handling configured, complaint feedback loop active

Analytics (3 items)

  • Core events -- Signup, activation, purchase, and churn events tracked with user ID and properties
  • Funnels -- Signup-to-activation and trial-to-paid funnels configured in analytics dashboard
  • Dashboards -- Real-time dashboard shows active users, revenue (MRR), error rate, and response time

Performance (3 items)

  • Load test -- k6/Artillery run at 2x expected peak traffic, p95 latency under 500ms, zero errors
  • CDN cache -- Cache hit ratio above 90% for static assets, proper Cache-Control and ETag headers
  • Image optimization -- All images served as WebP/AVIF with responsive srcset, lazy loading below the fold

Day-1 Monitoring Dashboard

Build this dashboard before launch. Every panel answers a specific question.

+-----------------------------------------------------+
|                  SaaS Launch Dashboard               |
+---------------------------+-------------------------+
| Request Rate (req/s)      | Error Rate (%)          |
| Normal: 10-50/s           | Target: < 1%            |
| Alert: > 200/s            | Alert: > 2%             |
+---------------------------+-------------------------+
| p95 Latency (ms)          | Active Users (real-time) |
| Target: < 200ms           | Shows WebSocket/polling  |
| Alert: > 500ms            | count                    |
+---------------------------+-------------------------+
| Signup Rate (/hr)         | Payment Success Rate (%) |
| Compare to projection     | Target: > 95%            |
| Alert: 0 for 30min        | Alert: < 90%             |
+---------------------------+-------------------------+
| CPU / Memory              | Database Connections      |
| Alert: > 80%              | Alert: > 80% pool        |
+---------------------------+-------------------------+

Dashboard as Code (Prometheus Queries)

yaml
# monitoring/launch-dashboard.yml
panels:
  - title: Request Rate
    query: sum(rate(http_requests_total[5m]))
    alert_threshold: 200

  - title: Error Rate
    query: |
      sum(rate(http_requests_total{status_code=~"5.."}[5m]))
      / sum(rate(http_requests_total[5m])) * 100
    alert_threshold: 2

  - title: p95 Latency
    query: |
      histogram_quantile(0.95,
        sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
    alert_threshold: 0.5

  - title: Signup Rate
    query: sum(increase(user_signups_total[1h]))
    alert_threshold: 0  # alert if zero for 30 min

  - title: Payment Success Rate
    query: |
      sum(rate(payment_completed_total[5m]))
      / sum(rate(payment_attempted_total[5m])) * 100
    alert_threshold: 90  # alert if below

Rollback Plan Template

Complete this before you deploy. If you cannot fill every field, you are not ready to launch.

markdown
# Rollback Plan: [Product Name] v[X.Y.Z]

## Decision Criteria
Trigger rollback if ANY of these occur within 60 minutes of deploy:
- [ ] Error rate exceeds 5%
- [ ] p95 latency exceeds 2 seconds for 5 consecutive minutes
- [ ] Payment processing fails for 3+ consecutive attempts
- [ ] Data integrity issue detected (mismatched records)

## Rollback Steps
1. Set feature flag `launch_v1` to OFF (immediate, <30 seconds)
2. Revert DNS/load balancer to previous deployment
3. Run: `kubectl rollout undo deployment/api --to-revision=PREV`
   OR: `docker compose -f docker-compose.prod.yml up -d --force-recreate`
4. Verify health check returns 200 on previous version
5. Notify #incidents channel: "Rollback executed, investigating"

## Data Migration Rollback
- [ ] Database migration has a DOWN migration
- [ ] Tested: `npm run migrate:down` or `python manage.py migrate APP PREVIOUS`
- [ ] New columns are NULLABLE (old code ignores them)
- [ ] No destructive changes (column drops, renames) in this release

## Communication
- Engineering: #incidents Slack channel
- Support: Pre-drafted message in support tool
- Customers: Status page update (only if downtime > 5 min)

## Owner
- Rollback decision: [On-call engineer name]
- Execution: [DevOps engineer name]
- Communication: [Support lead name]

Incident Response Playbook Skeleton

markdown
# Launch Day Incident Playbook

## Severity Levels
| Level | Definition                  | Response  | Example                    |
|-------|-----------------------------|-----------|----------------------------|
| SEV-1 | Service down, data loss     | 5 min     | Database unreachable       |
| SEV-2 | Major feature broken        | 15 min    | Payments failing           |
| SEV-3 | Minor feature broken        | 1 hour    | Email delivery delayed     |
| SEV-4 | Cosmetic issue              | Next day  | Alignment bug on Safari    |

## On-Call Roster (Launch Day)
| Role               | Primary        | Secondary      | Contact        |
|--------------------|----------------|----------------|----------------|
| Incident Commander | [Name]         | [Name]         | [Phone/Slack]  |
| Backend Engineer   | [Name]         | [Name]         | [Phone/Slack]  |
| Frontend Engineer  | [Name]         | [Name]         | [Phone/Slack]  |
| DevOps/SRE         | [Name]         | [Name]         | [Phone/Slack]  |

## Response Flow
1. DETECT  - Alert fires or user reports issue
2. TRIAGE  - Assign severity, open incident channel (#inc-YYYYMMDD-NN)
3. CONTAIN - Feature flag OFF, rollback, or scale up
4. FIX     - Root cause fix, deploy to staging, verify
5. DEPLOY  - Push fix to production with monitoring
6. REVIEW  - Post-incident review within 48 hours

## Pre-Written Status Page Messages
- Investigating: "We are aware of an issue affecting [X] and are investigating."
- Identified: "The issue has been identified. A fix is being deployed."
- Resolved: "The issue has been resolved. All systems are operational."

Post-Launch: First 7 Days Checklist

Day 1 (Launch Day)

  • All-hands monitoring: dashboard on a shared screen or TV
  • Respond to every support ticket within 2 hours
  • Track signup funnel: landing page visits -> signups -> activation
  • Check error logs every hour for new error patterns
  • Verify email deliverability (check spam folders with seed accounts)

Day 2-3

  • Review Day 1 analytics: signup count, activation rate, bounce rate
  • Fix any P1/P2 bugs discovered by real users
  • Check payment reconciliation: Stripe dashboard matches your records
  • Send first-day cohort a "how's it going?" email (personal, from founder)
  • Monitor server costs: actual vs projected infrastructure spend

Day 4-5

  • Analyze user behavior: where do users drop off? Heatmaps, session recordings
  • Run second load test if traffic patterns differ from projections
  • Review and tune alerting thresholds (reduce noise, catch real issues)
  • Check SEO: is Google indexing your pages? robots.txt correct?
  • Backlog grooming: prioritize bugs and feature requests from real users

Day 6-7

  • Week 1 retrospective: what worked, what did not, what to change
  • Publish week 1 metrics: signups, activation, retention, revenue
  • Plan week 2 priorities based on actual user data (not assumptions)
  • Thank early adopters (personal email, discount, or swag)
  • Remove launch-specific feature flags and temporary monitoring

Contrast: Phased Rollout vs Big Bang Launch

BAD: Big Bang Launch

Monday 9:00 AM:
  - Mass email to 50,000 person waitlist
  - ProductHunt launch post goes live
  - Hacker News submission
  - All social media posts scheduled simultaneously
  - Full feature set available to everyone

Monday 9:15 AM:
  - 3,000 concurrent users hit the app
  - Database connections exhausted
  - Payment webhook queue backs up
  - Support inbox: 200 tickets in 30 minutes
  - Error rate: 15%
  - Team panics

Monday 10:00 AM:
  - Emergency rollback
  - Status page: "We are experiencing issues"
  - ProductHunt comments: "This doesn't work"
  - First impression destroyed

GOOD: Phased Rollout

typescript
// Feature flag configuration for phased rollout
const LAUNCH_PHASES = {
  phase1: {
    name: 'Team and Friends',
    startDate: '2025-01-06',
    criteria: { userList: 'internal-testers' },  // 50 users
    goal: 'Find critical bugs before anyone else sees them',
  },
  phase2: {
    name: 'Beta Waitlist (10%)',
    startDate: '2025-01-08',
    criteria: { percentage: 10 },                 // 500 users
    goal: 'Validate onboarding flow and payment',
  },
  phase3: {
    name: 'Beta Waitlist (50%)',
    startDate: '2025-01-10',
    criteria: { percentage: 50 },                 // 2,500 users
    goal: 'Load test with real traffic patterns',
  },
  phase4: {
    name: 'Full Waitlist + Public',
    startDate: '2025-01-13',
    criteria: { percentage: 100 },                // everyone
    goal: 'General availability',
  },
} as const
Week 1 Monday:    50 internal users     -> Find showstoppers
Week 1 Wednesday: 500 beta users        -> Validate payment flow
Week 1 Friday:    2,500 beta users      -> Verify infrastructure holds
Week 2 Monday:    Full public launch    -> Confidence backed by data

Each phase gate:
  [x] Error rate < 1%
  [x] p95 latency < 300ms
  [x] Payment success > 98%
  [x] NPS from phase users > 30
  [x] Zero data integrity issues
  -> Only then proceed to next phase

Why Phased Rollout Wins

Dimension Big Bang Phased Rollout
Risk All-or-nothing Contained per phase
Feedback Overwhelming, chaotic Manageable, actionable
Infrastructure Guess and hope Scale based on real data
Recovery Public failure Private fix
First impression One shot Refined over 4 attempts
Team stress Maximum Distributed

Key principle: Launch is not a single moment -- it is a process. Every item you skip is a bet that nothing will go wrong in that area. The checklist exists to make the boring stuff automatic so you can focus on users.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results