Agent skill
bmad-observability-readiness
Establishes instrumentation, monitoring, and alerting foundations.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/bmad-observability-readiness-bacoco-bmad-skills
Metadata
Additional technical details for this skill
- outputs
-
[ "observability-plan", "instrumentation-backlog", "slo-dashboard-spec" ] - triggers
-
{ "keywords": [ "observability", "logging", "monitoring", "tracing", "metrics", "alerting", "telemetry" ], "patterns": [ "add logging", "monitoring setup", "no telemetry", "instrument this", "observability gaps", "alert fatigue", "SLO dashboard" ] } - auto invoke
- YES
- capabilities
-
[ "instrumentation-design", "metrics-cataloging", "logging-standards", "alert-tuning", "slo-definition" ] - prerequisites
-
[ "bmad-architecture-design", "bmad-test-strategy" ]
SKILL.md
BMAD Observability Readiness Skill
When to Invoke
Use this skill when the user:
- Mentions missing or low-quality logging, metrics, or tracing.
- Requests monitoring/alerting setup before a launch or major release.
- Needs SLOs, dashboards, or on-call runbooks.
- Reports alert fatigue or noise that needs rationalization.
- Wants to ensure performance and reliability work has data coverage.
If instrumentation already exists and only specific bug fixes are required, hand over to bmad-development-execution with the backlog produced here.
Mission
Deliver a comprehensive observability plan that enables diagnosis, alerting, and measurement across the system. Ensure downstream performance, reliability, and security work has trustworthy telemetry.
Inputs Required
- Architecture diagrams and component inventory.
- Existing logging/monitoring/tracing configuration (if any).
- Current incidents, outages, or blind spots experienced by the team.
- SLAs/SLOs, business KPIs, or compliance reporting requirements.
Outputs
- Observability plan detailing metrics, logs, traces, dashboards, and retention policies.
- Instrumentation backlog with implementation tasks, owners, and acceptance criteria.
- SLO dashboard specification covering golden signals, alert thresholds, and runbook links.
- Updated runbook or escalation paths if gaps were discovered.
Process
- Audit current telemetry coverage, tooling, and data retention. Document gaps.
- Define observability objectives aligned with user journeys and business KPIs.
- Design instrumentation strategy: metrics taxonomy, structured logging, trace spans, event schemas.
- Establish SLOs, SLIs, and alerting strategy with on-call expectations and noise controls.
- Produce dashboards/reporting requirements and data governance notes.
- Create backlog with prioritized instrumentation tasks and verification approach.
Quality Gates
- Every critical user journey has metrics and alerts defined (latency, errors, saturation, traffic).
- Logging standards specify structure, PII handling, and retention.
- Alert runbooks documented or flagged for creation.
- Observability plan references integration with performance, security, and incident workflows.
Error Handling
- If telemetry tooling is undecided, present comparative options with trade-offs.
- Highlight dependencies on platform teams or infrastructure before finalizing timeline.
- Escalate when observability requirements conflict with compliance or privacy constraints.
Didn't find tool you were looking for?