Agent skill

cloud-architect

Designs cloud architectures, creates migration plans, generates cost optimization recommendations, and produces disaster recovery strategies across AWS, Azure, and GCP. Use when designing cloud architectures, planning migrations, or optimizing multi-cloud deployments. Invoke for Well-Architected Framework, cost optimization, disaster recovery, landing zones, security architecture, serverless design.

Stars 7,481
Forks 528

Install this agent skill to your Project

npx add-skill https://github.com/Jeffallan/claude-skills/tree/main/skills/cloud-architect

Metadata

Additional technical details for this skill

role
architect
scope
infrastructure
domain
infrastructure
version
1.1.0
triggers
AWS, Azure, GCP, Google Cloud, cloud migration, cloud architecture, multi-cloud, cloud cost, Well-Architected, landing zone, cloud security, disaster recovery, cloud native, serverless architecture
output format
architecture
related skills
devops-engineer, kubernetes-specialist, terraform-engineer, security-reviewer, microservices-architect, monitoring-expert

SKILL.md

Cloud Architect

Core Workflow

  1. Discovery — Assess current state, requirements, constraints, compliance needs
  2. Design — Select services, design topology, plan data architecture
  3. Security — Implement zero-trust, identity federation, encryption
  4. Cost Model — Right-size resources, reserved capacity, auto-scaling
  5. Migration — Apply 6Rs framework, define waves, validate connectivity before cutover
  6. Operate — Set up monitoring, automation, continuous optimization

Workflow Validation Checkpoints

After Design: Confirm every component has a redundancy strategy and no single points of failure exist in the topology.

Before Migration cutover: Validate VPC peering or connectivity is fully established:

bash
# AWS: confirm peering connection is Active before proceeding
aws ec2 describe-vpc-peering-connections \
  --filters "Name=status-code,Values=active"

# Azure: confirm VNet peering state
az network vnet peering list \
  --resource-group myRG --vnet-name myVNet \
  --query "[].{Name:name,State:peeringState}"

After Migration: Verify application health and routing:

bash
# AWS: check target group health in ALB
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:...

After DR test: Confirm RTO/RPO targets were met; document actual recovery times.

Reference Guide

Load detailed guidance based on context:

Topic Reference Load When
AWS Services references/aws.md EC2, S3, Lambda, RDS, Well-Architected Framework
Azure Services references/azure.md VMs, Storage, Functions, SQL, Cloud Adoption Framework
GCP Services references/gcp.md Compute Engine, Cloud Storage, Cloud Functions, BigQuery
Multi-Cloud references/multi-cloud.md Abstraction layers, portability, vendor lock-in mitigation
Cost Optimization references/cost.md Reserved instances, spot, right-sizing, FinOps practices

Constraints

MUST DO

  • Design for high availability (99.9%+)
  • Implement security by design (zero-trust)
  • Use infrastructure as code (Terraform, CloudFormation)
  • Enable cost allocation tags and monitoring
  • Plan disaster recovery with defined RTO/RPO
  • Implement multi-region for critical workloads
  • Use managed services when possible
  • Document architectural decisions

MUST NOT DO

  • Store credentials in code or public repos
  • Skip encryption (at rest and in transit)
  • Create single points of failure
  • Ignore cost optimization opportunities
  • Deploy without proper monitoring
  • Use overly complex architectures
  • Ignore compliance requirements
  • Skip disaster recovery testing

Common Patterns with Examples

Least-Privilege IAM (Zero-Trust)

Rather than broad policies, scope permissions to specific resources and actions:

bash
# AWS: create a scoped role for an application
aws iam create-role \
  --role-name AppRole \
  --assume-role-policy-document file://trust-policy.json

aws iam put-role-policy \
  --role-name AppRole \
  --policy-name AppInlinePolicy \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::my-app-bucket/*"
    }]
  }'
hcl
# Terraform equivalent
resource "aws_iam_role" "app_role" {
  name               = "AppRole"
  assume_role_policy = data.aws_iam_policy_document.trust.json
}

resource "aws_iam_role_policy" "app_policy" {
  role = aws_iam_role.app_role.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3:GetObject", "s3:PutObject"]
      Resource = "${aws_s3_bucket.app.arn}/*"
    }]
  })
}

VPC with Public/Private Subnets (Terraform)

hcl
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = { Name = "main", CostCenter = var.cost_center }
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet("10.0.0.0/16", 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet("10.0.0.0/16", 8, count.index + 10)
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true
}

Auto-Scaling Group (Terraform)

hcl
resource "aws_autoscaling_group" "app" {
  desired_capacity    = 2
  min_size            = 1
  max_size            = 10
  vpc_zone_identifier = aws_subnet.private[*].id

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  tag {
    key                 = "CostCenter"
    value               = var.cost_center
    propagate_at_launch = true
  }
}

resource "aws_autoscaling_policy" "cpu_target" {
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 60.0
  }
}

Cost Analysis CLI

bash
# AWS: identify top cost drivers for the last 30 days
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[0].Groups[*].{Service:Keys[0],Cost:Metrics.UnblendedCost.Amount}' \
  --output table

# Azure: review spend by resource group
az consumption usage list \
  --start-date $(date -d '30 days ago' +%Y-%m-%d) \
  --end-date $(date +%Y-%m-%d) \
  --query "[].{ResourceGroup:resourceGroup,Cost:pretaxCost,Currency:currency}" \
  --output table

Output Templates

When designing cloud architecture, provide:

  1. Architecture diagram with services and data flow
  2. Service selection rationale (compute, storage, database, networking)
  3. Security architecture (IAM, network segmentation, encryption)
  4. Cost estimation and optimization strategy
  5. Deployment approach and rollback plan

Expand your agent's capabilities with these related and highly-rated skills.

Jeffallan/claude-skills

graphql-architect

Use when designing GraphQL schemas, implementing Apollo Federation, or building real-time subscriptions. Invoke for schema design, resolvers with DataLoader, query optimization, federation directives.

7,481 528
Explore
Jeffallan/claude-skills

dotnet-core-expert

Use when building .NET 8 applications with minimal APIs, clean architecture, or cloud-native microservices. Invoke for Entity Framework Core, CQRS with MediatR, JWT authentication, AOT compilation.

7,481 528
Explore
Jeffallan/claude-skills

kubernetes-specialist

Use when deploying or managing Kubernetes workloads. Invoke to create deployment manifests, configure pod security policies, set up service accounts, define network isolation rules, debug pod crashes, analyze resource limits, inspect container logs, or right-size workloads. Use for Helm charts, RBAC policies, NetworkPolicies, storage configuration, performance optimization, GitOps pipelines, and multi-cluster management.

7,481 528
Explore
Jeffallan/claude-skills

the-fool

Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.

7,481 528
Explore
Jeffallan/claude-skills

spec-miner

Reverse-engineering specialist that extracts specifications from existing codebases. Use when working with legacy or undocumented systems, inherited projects, or old codebases with no documentation. Invoke to map code dependencies, generate API documentation from source, identify undocumented business logic, figure out what code does, or create architecture documentation from implementation. Trigger phrases: reverse engineer, old codebase, no docs, no documentation, figure out how this works, inherited project, legacy analysis, code archaeology, undocumented features.

7,481 528
Explore
Jeffallan/claude-skills

secure-code-guardian

Use when implementing authentication/authorization, securing user input, or preventing OWASP Top 10 vulnerabilities — including custom security implementations such as hashing passwords with bcrypt/argon2, sanitizing SQL queries with parameterized statements, configuring CORS/CSP headers, validating input with Zod, and setting up JWT tokens. Invoke for authentication, authorization, input validation, encryption, OWASP Top 10 prevention, secure session management, and security hardening. For pre-built OAuth/SSO integrations or standalone security audits, consider a more specialized skill.

7,481 528
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results