Agent skill
k8s-troubleshoot
Debug Kubernetes pods, services, and cluster issues. Use when the user says "pod not starting", "CrashLoopBackOff", "service not reachable", "kubectl debug", "pod stuck pending", or asks about Kubernetes problems.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/devops/k8s-troubleshoot-mhalder-dotfiles-e451b251
SKILL.md
Kubernetes Troubleshoot
Debug pods, services, deployments, and networking issues in Kubernetes.
Instructions
- Identify the affected resource (pod, service, deployment)
- Get current state with
kubectl getandkubectl describe - Check logs if applicable
- Diagnose based on status/events
- Provide specific remediation steps
Diagnostic commands
# Pod debugging
kubectl get pods -o wide
kubectl describe pod <pod>
kubectl logs <pod> [--previous] [-c container]
kubectl get events --sort-by=.lastTimestamp
# Service/networking
kubectl get svc,endpoints
kubectl describe svc <service>
kubectl get ingress
# Resource issues
kubectl top pods
kubectl describe node <node> | grep -A5 "Allocated resources"
# Debug pod (ephemeral container)
kubectl debug -it <pod> --image=busybox --target=<container>
Common issues
| Status | Cause | Solution |
|---|---|---|
| Pending | No resources | Check node capacity, resource requests |
| Pending | No matching node | Check nodeSelector, taints/tolerations |
| ImagePullBackOff | Bad image/auth | Verify image name, imagePullSecrets |
| CrashLoopBackOff | App crashing | Check logs, entrypoint, health probes |
| CreateContainerConfigError | Bad configmap/secret | Verify referenced configs exist |
| Evicted | Node pressure | Check node conditions, resource limits |
Service not reachable checklist
- Pod running?
kubectl get pods -l app=<app> - Pod ready? Check readiness probe
- Endpoints exist?
kubectl get endpoints <svc> - Service selector matches pod labels?
- Port/targetPort correct?
- NetworkPolicy blocking traffic?
Rules
- MUST check events with
kubectl describebefore diagnosing - MUST check logs for CrashLoopBackOff
- Never delete pods/resources without user approval
- Never apply changes without showing the diff first
- Always specify namespace if not default:
-n <namespace>
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?