Agent skill
senior-devops
Comprehensive DevOps skill for CI/CD, infrastructure automation, containerization, and cloud platforms (AWS, GCP, Azure). Includes pipeline setup, infrastructure as code, deployment automation, and monitoring. Use when setting up pipelines, deploying applications, managing infrastructure, implementing monitoring, or optimizing deployment processes.
Install this agent skill to your Project
npx add-skill https://github.com/alirezarezvani/claude-skills/tree/main/engineering-team/senior-devops
SKILL.md
Senior Devops
Complete toolkit for senior devops with modern tools and best practices.
Quick Start
Main Capabilities
This skill provides three core capabilities through automated scripts:
# Script 1: Pipeline Generator — scaffolds CI/CD pipelines for GitHub Actions or CircleCI
python scripts/pipeline_generator.py ./app --platform=github --stages=build,test,deploy
# Script 2: Terraform Scaffolder — generates and validates IaC modules for AWS/GCP/Azure
python scripts/terraform_scaffolder.py ./infra --provider=aws --module=ecs-service --verbose
# Script 3: Deployment Manager — orchestrates container deployments with rollback support
python3 scripts/deployment_manager.py ./deploy --verbose --json
Core Capabilities
1. Pipeline Generator
Scaffolds CI/CD pipeline configurations for GitHub Actions or CircleCI, with stages for build, test, security scan, and deploy.
Example — GitHub Actions workflow:
# .github/workflows/ci.yml
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npm test -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v4
build-docker:
needs: build-and-test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
uses: docker/build-push-action@v5
with:
push: ${{ github.ref == 'refs/heads/main' }}
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
deploy:
needs: build-docker
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster production \
--service app-service \
--force-new-deployment
Usage:
python scripts/pipeline_generator.py <project-path> --platform=github|circleci --stages=build,test,deploy
2. Terraform Scaffolder
Generates, validates, and plans Terraform modules. Enforces consistent module structure and runs terraform validate + terraform plan before any apply.
Example — AWS ECS service module:
# modules/ecs-service/main.tf
resource "aws_ecs_task_definition" "app" {
family = var.service_name
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = var.cpu
memory = var.memory
container_definitions = jsonencode([{
name = var.service_name
image = var.container_image
essential = true
portMappings = [{
containerPort = var.container_port
protocol = "tcp"
}]
environment = [for k, v in var.env_vars : { name = k, value = v }]
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = "/ecs/${var.service_name}"
awslogs-region = var.aws_region
awslogs-stream-prefix = "ecs"
}
}
}])
}
resource "aws_ecs_service" "app" {
name = var.service_name
cluster = var.cluster_id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.app.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = var.service_name
container_port = var.container_port
}
}
Usage:
python scripts/terraform_scaffolder.py <target-path> --provider=aws|gcp|azure --module=ecs-service|gke-deployment|aks-service [--verbose]
3. Deployment Manager
Orchestrates deployments with blue/green or rolling strategies, health-check gates, and automatic rollback on failure.
Example — Kubernetes blue/green deployment (blue-slot specific elements):
# k8s/deployment-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
labels:
app: myapp
slot: blue # slot label distinguishes blue from green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
slot: blue
template:
metadata:
labels:
app: myapp
slot: blue
spec:
containers:
- name: app
image: ghcr.io/org/app:1.2.3
readinessProbe: # gate: pod must pass before traffic switches
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Usage:
python scripts/deployment_manager.py deploy \
--env=staging|production \
--image=app:1.2.3 \
--strategy=blue-green|rolling \
--health-check-url=https://app.example.com/healthz
python scripts/deployment_manager.py rollback --env=production --to-version=1.2.2
python scripts/deployment_manager.py --analyze --env=production # audit current state
Resources
- Pattern Reference:
references/cicd_pipeline_guide.md— detailed CI/CD patterns, best practices, anti-patterns - Workflow Guide:
references/infrastructure_as_code.md— IaC step-by-step processes, optimization, troubleshooting - Technical Guide:
references/deployment_strategies.md— deployment strategy configs, security considerations, scalability - Tool Scripts:
scripts/directory
Development Workflow
1. Infrastructure Changes (Terraform)
# Scaffold or update module
python scripts/terraform_scaffolder.py ./infra --provider=aws --module=ecs-service --verbose
# Validate and plan — review diff before applying
terraform -chdir=infra init
terraform -chdir=infra validate
terraform -chdir=infra plan -out=tfplan
# Apply only after plan review
terraform -chdir=infra apply tfplan
# Verify resources are healthy
aws ecs describe-services --cluster production --services app-service \
--query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}'
2. Application Deployment
# Generate or update pipeline config
python scripts/pipeline_generator.py . --platform=github --stages=build,test,security,deploy
# Build and tag image
docker build -t ghcr.io/org/app:$(git rev-parse --short HEAD) .
docker push ghcr.io/org/app:$(git rev-parse --short HEAD)
# Deploy with health-check gate
python scripts/deployment_manager.py deploy \
--env=production \
--image=app:$(git rev-parse --short HEAD) \
--strategy=blue-green \
--health-check-url=https://app.example.com/healthz
# Verify pods are running
kubectl get pods -n production -l app=myapp
kubectl rollout status deployment/app-blue -n production
# Switch traffic after verification
kubectl patch service app-svc -n production \
-p '{"spec":{"selector":{"slot":"blue"}}}'
3. Rollback Procedure
# Immediate rollback via deployment manager
python scripts/deployment_manager.py rollback --env=production --to-version=1.2.2
# Or via kubectl
kubectl rollout undo deployment/app -n production
kubectl rollout status deployment/app -n production
# Verify rollback succeeded
kubectl get pods -n production -l app=myapp
curl -sf https://app.example.com/healthz || echo "ROLLBACK FAILED — escalate"
Multi-Cloud Cross-References
Use these companion skills for cloud-specific deep dives:
| Skill | Cloud | Use When |
|---|---|---|
| aws-solution-architect | AWS | ECS/EKS, Lambda, VPC design, cost optimization |
| azure-cloud-architect | Azure | AKS, App Service, Virtual Networks, Azure DevOps |
| gcp-cloud-architect | GCP | GKE, Cloud Run, VPC, Cloud Build (coming soon) |
Multi-cloud vs single-cloud decision:
- Single-cloud (default) — lower operational complexity, deeper managed-service integration, better cost leverage with committed-use discounts
- Multi-cloud — required when mandated by compliance/data residency, acquiring companies on different clouds, or needing best-of-breed services across providers (e.g., AWS for compute + GCP for ML)
- Hybrid — on-prem + cloud; use when regulated workloads must stay on-prem while burst/non-sensitive workloads run in the cloud
Start single-cloud. Add a second cloud only when there is a concrete business or compliance driver — not for theoretical redundancy.
Cloud-Agnostic IaC
Terraform / OpenTofu (Default Choice)
Terraform (or its open-source fork OpenTofu) is the recommended IaC tool for most teams:
- Single language (HCL) across AWS, Azure, GCP, and 3,000+ providers
- State management with remote backends (S3, GCS, Azure Blob)
- Plan-before-apply workflow prevents drift surprises
- Cross-reference terraform-patterns for module structure, state isolation, and CI/CD integration
Pulumi (Programming Language IaC)
Choose Pulumi when the team strongly prefers TypeScript, Python, Go, or C# over HCL:
- Full programming language — loops, conditionals, unit tests native
- Same cloud provider coverage as Terraform
- Easier onboarding for dev teams that resist learning HCL
When to Use Cloud-Native IaC
| Tool | Use When |
|---|---|
| CloudFormation | AWS-only shop; need native AWS support (StackSets, Service Catalog) |
| Bicep | Azure-only shop; simpler syntax than ARM templates |
| Cloud Deployment Manager | GCP-only; rare — most GCP teams prefer Terraform |
Rule of thumb: Use Terraform/OpenTofu unless you are 100% committed to a single cloud AND the cloud-native tool offers a feature Terraform cannot replicate (e.g., AWS Service Catalog integration).
Troubleshooting
Check the comprehensive troubleshooting section in references/deployment_strategies.md.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
business-growth-skills
4 business growth agent skills and plugins for Claude Code, Codex, Gemini CLI, Cursor, OpenClaw. Customer success (health scoring, churn), sales engineer (RFP), revenue operations (pipeline, GTM), contract & proposal writer. Python tools (stdlib-only).
contract-and-proposal-writer
Contract & Proposal Writer
sales-engineer
Analyzes RFP/RFI responses for coverage gaps, builds competitive feature comparison matrices, and plans proof-of-concept (POC) engagements for pre-sales engineering. Use when responding to RFPs, bids, or proposal requests; comparing product features against competitors; planning or scoring a customer POC or sales demo; preparing a technical proposal; or performing win/loss competitor analysis. Handles tasks described as 'RFP response', 'bid response', 'proposal response', 'competitor comparison', 'feature matrix', 'POC planning', 'sales demo prep', or 'pre-sales engineering'.
customer-success-manager
Monitors customer health, predicts churn risk, and identifies expansion opportunities using weighted scoring models for SaaS customer success. Use when analyzing customer accounts, reviewing retention metrics, scoring at-risk customers, or when the user mentions churn, customer health scores, upsell opportunities, expansion revenue, retention analysis, or customer analytics. Runs three Python CLI tools to produce deterministic health scores, churn risk tiers, and prioritized expansion recommendations across Enterprise, Mid-Market, and SMB segments.
revenue-operations
Analyzes sales pipeline health, revenue forecasting accuracy, and go-to-market efficiency metrics for SaaS revenue optimization. Use when analyzing sales pipeline coverage, forecasting revenue, evaluating go-to-market performance, reviewing sales metrics, assessing pipeline analysis, tracking forecast accuracy with MAPE, calculating GTM efficiency, or measuring sales efficiency and unit economics for SaaS teams.
marketing-skills
42 marketing agent skills and plugins for Claude Code, Codex, Gemini CLI, Cursor, OpenClaw, and 6 more coding agents. 7 pods: content, SEO, CRO, channels, growth, intelligence, sales. Foundation context + orchestration router. 27 Python tools (stdlib-only).
Didn't find tool you were looking for?