Agent skill
nw-production-readiness
Monitoring, observability, operational procedures, CI/CD lessons learned, and quality gate definitions. Load when assessing production readiness or validating operational excellence.
Install this agent skill to your Project
npx add-skill https://github.com/nWave-ai/nWave/tree/main/plugins/nw/skills/nw-production-readiness
SKILL.md
Production Readiness
Monitoring and Observability
Application Monitoring
- Performance: response time | throughput | latency percentiles (P50, P95, P99)
- Resources: CPU | memory | database connections | cache hit rates
- Errors: exception tracking | error rate trends | integration failure detection
- Business: KPI tracking | conversion funnels | feature usage | revenue impact
Infrastructure Monitoring
Server/container health and resource utilization | Network performance and connectivity | Storage capacity and I/O performance | Security event detection.
Alerting Tiers
| Tier | Condition | Response |
|---|---|---|
| Page | Service down, data loss risk, security breach | Immediate response |
| Urgent | Error rate >2x baseline, latency SLA breach | Response within 15 min |
| Warning | Capacity >80%, error rate trending up | Response within 1 hour |
| Info | Deployment complete, metric threshold crossed | Review next business day |
Operational Procedures
Incident Response
- Detect: automated alerting identifies issue
- Triage: classify severity, assign responder
- Communicate: notify stakeholders per severity level
- Resolve: apply fix or rollback
- Review: post-incident review within 48 hours
- Improve: update runbooks and monitoring based on findings
Maintenance Procedures
Regular update and patching schedule | Backup verification (test restores quarterly) | Security vulnerability scanning (automated, weekly) | Performance baseline recalibration (after major changes).
Knowledge Transfer
Operational runbooks for common procedures | Architecture documentation with system diagrams | Deployment procedures and configuration management | Troubleshooting guides for known failure modes.
Quality Gates for Production Readiness
Before declaring production-ready, all must pass:
- All acceptance tests passing
- Unit coverage meets project standard (default: >= 80%)
- Integration tests validated
- Performance validated under realistic load
- Security scan completed (0 critical, 0 high)
- Monitoring and alerting configured
- Logging structured and searchable
- Rollback procedure documented and tested
- Runbook created for operational procedures
- On-call team trained on new feature
For CI/CD architecture lessons and measurement coupling pitfalls, see cicd-and-deployment skill.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
nw-research
Gathers knowledge from web and files, cross-references across multiple sources, and produces cited research documents. Use when investigating technologies, patterns, or decisions that need evidence backing.
nw-distill
Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.
nw-review-output-format
YAML output format and approval criteria for platform design reviews. Load when generating review feedback.
nw-ddd-tactical
Tactical DDD — aggregate design rules, entities, value objects, domain events, repositories, domain services, and anti-pattern detection
nw-infrastructure-and-observability
Infrastructure as Code patterns (Terraform, Kubernetes), observability design (SLOs, metrics, alerting, dashboards), and pipeline security stages. Load when designing infrastructure, observability, or security scanning.
nw-par-critique-dimensions
Platform design review critique dimensions and severity levels. Load when reviewing CI/CD pipelines, infrastructure, deployment strategies, observability, or security designs.
Didn't find tool you were looking for?