Agent skill
multi-cluster
Multi-cluster Kubernetes management, federation, and hybrid deployments
Install this agent skill to your Project
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-kubernetes/tree/main/skills/multi-cluster
SKILL.md
Multi-Cluster Kubernetes
Executive Summary
Production-grade multi-cluster Kubernetes management covering federation, cross-cluster networking, and disaster recovery patterns. This skill provides deep expertise in designing and operating globally distributed Kubernetes infrastructure.
Core Competencies
1. Multi-Cluster Architecture
Topology Patterns
Hub-Spoke:
┌─────────┐
│ Hub │
│ Cluster │
└────┬────┘
┌───────────────┼───────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Spoke 1 │ │ Spoke 2 │ │ Spoke 3 │
│ (Dev) │ │ (Stage) │ │ (Prod) │
└─────────┘ └─────────┘ └─────────┘
Mesh:
┌─────────┐ ┌─────────┐
│Cluster 1│◄────────►│Cluster 2│
│ (US) │ │ (EU) │
└────┬────┘ └────┬────┘
│ │
└────────┬───────────┘
┌───▼───┐
│Cluster│
│3 (AP) │
└───────┘
2. ArgoCD Multi-Cluster
ApplicationSet Generator
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: api-server
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'api-server-{{name}}'
spec:
project: default
source:
repoURL: https://github.com/org/api-server
targetRevision: HEAD
path: k8s/overlays/production
destination:
server: '{{server}}'
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
Register External Cluster
# Add cluster to ArgoCD
argocd cluster add prod-cluster --name prod --kubeconfig ~/.kube/prod.yaml
# List clusters
argocd cluster list
# Verify connectivity
argocd cluster get prod
3. Cross-Cluster Networking
Cilium Cluster Mesh
# Enable cluster mesh on each cluster
cilium clustermesh enable --context cluster1
cilium clustermesh enable --context cluster2
# Connect clusters
cilium clustermesh connect --context cluster1 --destination-context cluster2
# Verify
cilium clustermesh status
Global Service
apiVersion: v1
kind: Service
metadata:
name: api-server
annotations:
service.cilium.io/global: "true"
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
selector:
app: api-server
4. Disaster Recovery
Active-Active Configuration
# External DNS for GSLB
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
name: api-global
spec:
endpoints:
- dnsName: api.example.com
recordType: A
targets:
- 52.1.1.1 # US cluster
- 35.2.2.2 # EU cluster
setIdentifier: us-east
recordTTL: 60
---
# Each cluster has identical deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
# ... same configuration in both clusters
Velero Cross-Cluster Backup
# Install Velero in both clusters
velero install \
--provider aws \
--bucket velero-backups \
--backup-location-config region=us-east-1
# Create backup
velero backup create prod-backup \
--include-namespaces production \
--snapshot-volumes
# Restore in DR cluster
velero restore create --from-backup prod-backup
5. Fleet Management
Rancher Fleet
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: api-server
namespace: fleet-default
spec:
repo: https://github.com/org/api-server
branch: main
paths:
- k8s/
targets:
- clusterSelector:
matchLabels:
env: production
name: production
- clusterSelector:
matchLabels:
env: staging
name: staging
Integration Patterns
Uses skill: cluster-admin
- Cluster provisioning
- Certificate management
Coordinates with skill: gitops
- Multi-cluster GitOps
- ApplicationSets
Works with skill: storage-networking
- Cross-cluster networking
- Data replication
Troubleshooting Guide
Decision Tree: Multi-Cluster Issues
Multi-Cluster Issue?
│
├── Cluster unreachable
│ ├── Check network connectivity
│ ├── Verify kubeconfig
│ └── Check cluster health
│
├── Sync failures
│ ├── Check ArgoCD logs
│ ├── Verify RBAC permissions
│ └── Check resource conflicts
│
└── Service discovery fails
├── Check mesh connectivity
├── Verify DNS configuration
└── Check NetworkPolicies
Debug Commands
# ArgoCD cluster status
argocd cluster list
argocd app list --dest-server <server>
# Cilium mesh status
cilium clustermesh status
cilium connectivity test
# Cross-cluster DNS
kubectl run debug --rm -it --image=nicolaka/netshoot -- \
nslookup <service>.default.svc.clusterset.local
Common Challenges & Solutions
| Challenge | Solution |
|---|---|
| Network latency | Use regional clusters |
| State sync | Eventually consistent design |
| Failover delay | Health checks, DNS TTL |
| Config drift | GitOps, policy enforcement |
Success Criteria
| Metric | Target |
|---|---|
| Cross-cluster latency | <50ms (regional) |
| Failover time | <2 minutes |
| Config consistency | 100% |
| Cluster availability | 99.99% |
Resources
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gitops
Master GitOps practices, CI/CD integration, Helm charts, Kustomize, and ArgoCD. Learn modern deployment patterns and infrastructure as code.
deployments
Master Kubernetes Deployments, StatefulSets, DaemonSets, and workload orchestration. Learn deployment patterns and container orchestration strategies.
cluster-admin
Master Kubernetes cluster administration, from initial setup through production management. Learn cluster installation, scaling, upgrades, and HA strategies.
troubleshooting
Kubernetes debugging, problem diagnosis, and issue resolution
helm
Helm package management, chart development, and release management
monitoring
Master Kubernetes observability, monitoring with Prometheus, logging, metrics, and distributed tracing. Learn to implement comprehensive monitoring strategies.
Didn't find tool you were looking for?