Agent skill
cost-optimization
Kubernetes cost management, resource optimization, and FinOps practices
Install this agent skill to your Project
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-kubernetes/tree/main/skills/cost-optimization
SKILL.md
Kubernetes Cost Optimization
Executive Summary
Production-grade Kubernetes cost management covering resource optimization, autoscaling, and FinOps practices. This skill provides deep expertise in achieving 30-50% cost reduction while maintaining performance and reliability.
Core Competencies
1. Resource Right-Sizing
Vertical Pod Autoscaler
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto" # or "Off" for recommendations only
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu", "memory"]
Resource Recommendations Analysis
# Get VPA recommendations
kubectl describe vpa api-server-vpa
# Check current vs recommended
kubectl get vpa api-server-vpa -o jsonpath='{.status.recommendation}'
# Goldilocks for all deployments
kubectl apply -f https://github.com/FairwindsOps/goldilocks/releases/latest/download/goldilocks.yaml
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
2. Cost Visibility
Kubecost Installation
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="YOUR_TOKEN" \
--set prometheus.nodeExporter.enabled=false \
--set prometheus.serviceAccounts.nodeExporter.create=false
Cost Allocation Labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
labels:
# Cost allocation labels
team: backend
environment: production
product: ecommerce
cost-center: engineering
spec:
template:
metadata:
labels:
team: backend
cost-center: engineering
3. Intelligent Autoscaling
HPA with Cost Awareness
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
KEDA for Event-Driven Scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-server
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 50
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
query: sum(rate(http_requests_total{app="api-server"}[1m]))
threshold: "100"
- type: cron
metadata:
timezone: America/New_York
start: 0 8 * * 1-5
end: 0 20 * * 1-5
desiredReplicas: "5"
4. Spot/Preemptible Nodes
Mixed Node Pool Strategy
# Spot-tolerant workloads
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
template:
spec:
nodeSelector:
kubernetes.io/capacity-type: spot
tolerations:
- key: kubernetes.io/capacity-type
value: spot
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/capacity-type
operator: In
values:
- spot
Cluster Autoscaler with Mixed Pools
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
spec:
template:
spec:
containers:
- name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --expander=priority
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled
- --balance-similar-node-groups=true
- --skip-nodes-with-local-storage=false
5. Waste Elimination
Idle Resource Detection
# Find oversized deployments
kubectl get deployments -A -o json | jq '
.items[] |
select(.spec.replicas > 0) |
{
namespace: .metadata.namespace,
name: .metadata.name,
replicas: .spec.replicas,
cpu_request: .spec.template.spec.containers[0].resources.requests.cpu,
memory_request: .spec.template.spec.containers[0].resources.requests.memory
}
'
# Find unused PVCs
kubectl get pvc -A --no-headers | while read ns name _; do
used=$(kubectl get pods -n $ns -o json | jq --arg pvc "$name" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName == $pvc)')
[ -z "$used" ] && echo "Unused PVC: $ns/$name"
done
Resource Cleanup Policy
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: cleanup-stale-pods
spec:
rules:
- name: delete-completed-jobs
match:
resources:
kinds:
- Job
preconditions:
all:
- key: "{{ request.object.status.succeeded }}"
operator: Equals
value: 1
- key: "{{ time_since('', '{{ request.object.status.completionTime }}', '') }}"
operator: GreaterThan
value: "24h"
mutate:
patchStrategicMerge:
metadata:
deletionTimestamp: "{{ time_now() }}"
Integration Patterns
Uses skill: cluster-admin
- Node pool management
- Cluster autoscaling
Coordinates with skill: monitoring
- Resource metrics
- Cost dashboards
Works with skill: deployments
- HPA configuration
- Resource requests
Troubleshooting Guide
Decision Tree: Cost Issues
High Costs?
│
├── Over-provisioned
│ ├── Check VPA recommendations
│ ├── Right-size requests
│ └── Enable HPA
│
├── Idle resources
│ ├── Find unused PVCs
│ ├── Check scale-to-zero
│ └── Clean up stale jobs
│
└── Wrong instance types
├── Use spot for batch
├── Review node pools
└── Check reserved coverage
Debug Commands
# Cost analysis
kubectl top pods -A --sort-by=cpu
kubectl top pods -A --sort-by=memory
# Resource efficiency
kubectl get pods -A -o json | jq '[.items[].spec.containers[].resources] | add'
# Kubecost API
curl http://kubecost:9090/model/allocation?window=7d&aggregate=namespace
Common Challenges & Solutions
| Challenge | Solution |
|---|---|
| Overprovisioning | VPA, right-sizing |
| Idle resources | Scale-to-zero, cleanup |
| Spot interruptions | PDB, spreading |
| Cost attribution | Labels, Kubecost |
Success Criteria
| Metric | Target |
|---|---|
| Cost reduction | 30-50% |
| Resource utilization | >60% |
| Waste identification | <10% idle |
| Budget compliance | 100% |
Resources
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
gitops
Master GitOps practices, CI/CD integration, Helm charts, Kustomize, and ArgoCD. Learn modern deployment patterns and infrastructure as code.
deployments
Master Kubernetes Deployments, StatefulSets, DaemonSets, and workload orchestration. Learn deployment patterns and container orchestration strategies.
cluster-admin
Master Kubernetes cluster administration, from initial setup through production management. Learn cluster installation, scaling, upgrades, and HA strategies.
troubleshooting
Kubernetes debugging, problem diagnosis, and issue resolution
helm
Helm package management, chart development, and release management
multi-cluster
Multi-cluster Kubernetes management, federation, and hybrid deployments
Didn't find tool you were looking for?