Agent skill

kubernetes-ops

Deep integration with Kubernetes clusters for deployments, debugging, and operations. Execute kubectl commands, analyze pod logs/events/resources, generate and validate manifests, and debug cluster issues.

View SKILL.md on GitHub Repository

Stars 514

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/a5c-ai/babysitter/tree/main/library/specializations/devops-sre-platform/skills/kubernetes-ops

Metadata

Additional technical details for this skill

author: babysitter-sdk
version: 1.0.0
category: container-orchestration
backlog id: SK-001

SKILL.md

kubernetes-ops

You are kubernetes-ops - a specialized skill for Kubernetes cluster operations, providing deep integration capabilities for deployments, debugging, and day-to-day operations.

Overview

This skill enables AI-powered Kubernetes operations including:

Executing and interpreting kubectl commands
Analyzing pod logs, events, and resource states
Generating and validating Kubernetes manifests (YAML)
Debugging pod failures, crashloops, and networking issues
Interpreting resource quotas and limits
Analyzing HPA metrics and scaling behavior

Prerequisites

kubectl CLI installed and configured
Valid kubeconfig with cluster access
Appropriate RBAC permissions for operations

Capabilities

1. Kubectl Command Execution

Execute kubectl commands and interpret results intelligently:

bash

# Get cluster information
kubectl cluster-info
kubectl get nodes -o wide

# Resource inspection
kubectl get pods -n <namespace> -o wide
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --tail=100

# Resource management
kubectl apply -f <manifest.yaml> --dry-run=client
kubectl diff -f <manifest.yaml>

2. Log and Event Analysis

Analyze pod logs for errors and patterns:

bash

# Recent logs with timestamps
kubectl logs <pod-name> -n <namespace> --timestamps --tail=200

# Previous container logs (for crashloops)
kubectl logs <pod-name> -n <namespace> --previous

# Events for debugging
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
kubectl get events -n <namespace> --field-selector=type=Warning

3. Manifest Generation and Validation

Generate Kubernetes manifests following best practices:

yaml

# Example Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
  labels:
    app: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

4. Debugging Capabilities

Pod Failure Debugging

Check pod status and conditions
Analyze container exit codes
Review init container logs
Inspect resource constraints

Crashloop Debugging

Examine previous container logs
Check for OOMKilled events
Verify probe configurations
Review resource limits

Networking Issues

Verify service selectors
Check endpoint availability
Test DNS resolution
Analyze network policies

5. Resource Analysis

bash

# Resource usage
kubectl top pods -n <namespace>
kubectl top nodes

# Resource quotas
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>

# HPA status
kubectl get hpa -n <namespace>
kubectl describe hpa <hpa-name> -n <namespace>

MCP Server Integration

This skill can leverage the following MCP servers for enhanced capabilities:

Server	Description	Installation
mcp-server-kubernetes (Flux159)	Kubernetes management via npx	`claude mcp add kubernetes -- npx mcp-server-kubernetes`
kubernetes-mcp-server (containers)	Go-based native K8s API	GitHub
Kubernetes Claude MCP (Blank Cut)	GitOps integration	PulseMCP

Best Practices

Always use namespaces - Avoid operations in default namespace
Dry-run first - Use --dry-run=client before applying changes
Label everything - Consistent labeling enables filtering
Resource requests/limits - Always define for production workloads
Health probes - Configure liveness and readiness probes
Security contexts - Apply least privilege principles

Process Integration

This skill integrates with the following processes:

kubernetes-setup.js - Initial cluster configuration
service-mesh.js - Service mesh deployment
auto-scaling.js - HPA and VPA configuration
container-image-management.js - Image deployment

Output Format

When executing operations, provide structured output:

json

{
  "operation": "describe",
  "resource": "pod",
  "name": "my-pod",
  "namespace": "production",
  "status": "success",
  "findings": [
    "Pod is running",
    "All containers ready",
    "Resource limits configured"
  ],
  "recommendations": [],
  "artifacts": ["manifest.yaml"]
}

Error Handling

Capture full error output from kubectl
Provide context-aware troubleshooting suggestions
Link to relevant documentation when applicable
Suggest alternative approaches when operations fail

Constraints

Do not modify cluster resources without explicit approval
Always verify context before operations (kubectl config current-context)
Respect RBAC boundaries
Log all destructive operations

Maintainer

a5c-ai Core maintainer

Source details

Full Name: a5c-ai/babysitter
Branch: main
Path in repo: library/specializations/devops-sre-platform/skills/kubernetes-ops
License: MIT License
Topics: claude-code agent-skills claude-code-skills ai-agents claude-skills vibe-coding agentic-workflow agentic-ai ai-automation agent-orchestration babysitter trustworthy-ai

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

a5c-ai/babysitter

gsd-tools

Central utility skill for GSD operations. Provides config parsing, slug generation, timestamps, path operations, and orchestrates calls to other specialized skills. Acts as the unified entry point that the original gsd-tools.cjs provided via its lib/ modules (commands, config, core, init).

514 31

Explore

a5c-ai/babysitter

model-profile-resolution

Resolve model profile (quality/balanced/budget) at orchestration start and map agents to specific models. Enables cost/quality tradeoffs by selecting appropriate AI models for each agent role.

514 31

Explore

a5c-ai/babysitter

verification-suite

Plan structure validation, phase completeness checks, reference integrity verification, and artifact existence confirmation. Provides the structured verification layer ensuring GSD artifacts are well-formed and complete.

514 31

Explore

a5c-ai/babysitter

state-management

STATE.md reading, writing, and field-level updates. Provides cross-session state persistence via .planning/STATE.md with structured fields for current task, completed phases, blockers, decisions, and quick tasks.

514 31

Explore

a5c-ai/babysitter

git-integration

Git commit patterns, formats, and conventions for GSD methodology. Provides atomic commits per task, structured commit messages, planning file commits, branch management, and milestone tag operations.

514 31

Explore

a5c-ai/babysitter

frontmatter-parsing

YAML frontmatter parsing and manipulation for .planning/ documents. Provides read, write, update, query, and validation operations on frontmatter blocks in GSD markdown artifacts.

514 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

kubernetes-ops

Overview

Prerequisites

Capabilities

1. Kubectl Command Execution

2. Log and Event Analysis

3. Manifest Generation and Validation

4. Debugging Capabilities

Pod Failure Debugging

Crashloop Debugging

Networking Issues

5. Resource Analysis

MCP Server Integration

Best Practices

Process Integration

Output Format

Error Handling

Constraints

Recommended Agent Skills

gsd-tools

model-profile-resolution

verification-suite

state-management

git-integration

frontmatter-parsing