Agent skill
grafana-platform-dashboard
Design, refactor, and validate Grafana dashboards for OpenShift/Kubernetes platform operations. Use when users ask to improve platform health dashboards, prioritize critical tenant-impacting signals, filter noise (for example ArgoCD), add Crossplane/Keycloak health panels, validate PromQL programmatically, or apply GrafanaDashboard CR changes live then promote to GitOps.
Install this agent skill to your Project
npx add-skill https://github.com/boshu2/agentops/tree/main/skills/grafana-platform-dashboard
Metadata
Additional technical details for this skill
- tier
- execution
- dependencies
-
[ "research", "brainstorm" ]
SKILL.md
Grafana Platform Dashboard
Design platform operations dashboards so operators see tenant-impacting risk first, then drill into service-specific health without overload.
Quick Start
Use this skill when the user asks for platform dashboard updates and reliability checks.
- Confirm dashboard target:
oc --context <ctx> get grafanadashboard -A | rg -i '<dashboard-name-or-theme>'
- Export dashboard and JSON:
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh export \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--out-dir /tmp/<workspace>
- Edit the JSON and validate all PromQL:
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json /tmp/<workspace>/<name>.json
- Apply live safely:
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--json /tmp/<workspace>/<name>.json
Workflow
1) Lock Scope From Platform Contracts
Use the platform contract in platform-contract.md before editing panels.
- Keep L1 command view constrained to critical pre-tenant-impact signals.
- Use gate-aligned components first (critical CO gate, nodes, MCP, core API/etcd/ingress).
- Keep service-specific sections (Crossplane, Keycloak) below L1.
2) Enforce Information Architecture
Use layout-guidelines.md:
- L1: critical-only, immediate action, minimal panel budget.
- L2: platform services by dependency domain.
- L3: deep dives (for example future GPU dashboard), not in L1.
3) Build Queries From Known Library
Use promql-library.md:
- Start from known-good queries and adapt labels minimally.
- Prefer counts and action tables over decorative charts.
- Filter alert noise explicitly (for example ArgoCD/GitOps) when requested.
4) Validate Before Apply
Always run the scan script after edits:
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json <file.json> \
--output <scan.tsv>
Pass criteria: all queries report success, zero bad/parse errors.
5) Apply and Verify Sync
Apply only after validation succeeds:
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply ...
oc --context <ctx> -n <ns> get grafanadashboard <name> \
-o jsonpath='{.status.conditions[?(@.type=="DashboardSynchronized")].status}{"|"}{.status.conditions[?(@.type=="DashboardSynchronized")].reason}{"\n"}'
6) Close With Operator-Focused Summary
Report:
- What changed (panel names and intent).
- Validation result (query count and failures).
- Sync status and any residual risk.
- Next step: promote live changes into GitOps-managed source.
Design Rules
- Put critical tenant-impact predictors first.
- Every red panel must imply an action path.
- Avoid ambiguous panel names (for example replace “platform pods” with concrete namespace scope).
- Keep L1 low-noise; move detail below or to dedicated dashboards.
- Keep GPU deep diagnostics in a dedicated GPU dashboard, not mixed into L1.
References
- Platform Contract
- PromQL Panel Library
- Layout Guidelines
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
swarm
Spawn isolated Codex sub-agents for parallel task execution using the current runtime primitives. Triggers: "swarm", "spawn agents", "parallel work", "run in parallel", "parallel execution".
council
Multi-perspective review for Codex using the current sub-agent runtime. Triggers: "council", "get consensus", "multi-model review", "multi-perspective review", "council validate", "council brainstorm", "council research".
openai-docs
Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations (for example: Codex, Responses API, Chat Completions, Apps SDK, Agents SDK, Realtime, model capabilities or limits); prioritize OpenAI docs MCP tools and restrict any fallback browsing to official OpenAI domains.
crank
Hands-free epic execution for Codex using wave-based sub-agents and lead-side validation. Triggers: "crank", "run epic", "execute epic", "run all tasks", "hands-free execution", "crank it".
pr-retro
Learn from PR outcomes. Analyzes accept/reject patterns and updates contribution lessons. Triggers: "pr retro", "learn from PR", "PR outcome", "why was PR rejected", "analyze PR feedback".
ratchet
Brownian Ratchet progress gates for RPI workflow. Check, record, verify. Triggers: "check gate", "verify progress", "ratchet status".
Didn't find tool you were looking for?