Agent skill

bmad-observability-readiness

Establishes instrumentation, monitoring, and alerting foundations.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/bmad-observability-readiness-bacoco-bmad-skills

Metadata

Additional technical details for this skill

outputs
[
    "observability-plan",
    "instrumentation-backlog",
    "slo-dashboard-spec"
]
triggers
{
    "keywords": [
        "observability",
        "logging",
        "monitoring",
        "tracing",
        "metrics",
        "alerting",
        "telemetry"
    ],
    "patterns": [
        "add logging",
        "monitoring setup",
        "no telemetry",
        "instrument this",
        "observability gaps",
        "alert fatigue",
        "SLO dashboard"
    ]
}
auto invoke
YES
capabilities
[
    "instrumentation-design",
    "metrics-cataloging",
    "logging-standards",
    "alert-tuning",
    "slo-definition"
]
prerequisites
[
    "bmad-architecture-design",
    "bmad-test-strategy"
]

SKILL.md

BMAD Observability Readiness Skill

When to Invoke

Use this skill when the user:

  • Mentions missing or low-quality logging, metrics, or tracing.
  • Requests monitoring/alerting setup before a launch or major release.
  • Needs SLOs, dashboards, or on-call runbooks.
  • Reports alert fatigue or noise that needs rationalization.
  • Wants to ensure performance and reliability work has data coverage.

If instrumentation already exists and only specific bug fixes are required, hand over to bmad-development-execution with the backlog produced here.

Mission

Deliver a comprehensive observability plan that enables diagnosis, alerting, and measurement across the system. Ensure downstream performance, reliability, and security work has trustworthy telemetry.

Inputs Required

  • Architecture diagrams and component inventory.
  • Existing logging/monitoring/tracing configuration (if any).
  • Current incidents, outages, or blind spots experienced by the team.
  • SLAs/SLOs, business KPIs, or compliance reporting requirements.

Outputs

  • Observability plan detailing metrics, logs, traces, dashboards, and retention policies.
  • Instrumentation backlog with implementation tasks, owners, and acceptance criteria.
  • SLO dashboard specification covering golden signals, alert thresholds, and runbook links.
  • Updated runbook or escalation paths if gaps were discovered.

Process

  1. Audit current telemetry coverage, tooling, and data retention. Document gaps.
  2. Define observability objectives aligned with user journeys and business KPIs.
  3. Design instrumentation strategy: metrics taxonomy, structured logging, trace spans, event schemas.
  4. Establish SLOs, SLIs, and alerting strategy with on-call expectations and noise controls.
  5. Produce dashboards/reporting requirements and data governance notes.
  6. Create backlog with prioritized instrumentation tasks and verification approach.

Quality Gates

  • Every critical user journey has metrics and alerts defined (latency, errors, saturation, traffic).
  • Logging standards specify structure, PII handling, and retention.
  • Alert runbooks documented or flagged for creation.
  • Observability plan references integration with performance, security, and incident workflows.

Error Handling

  • If telemetry tooling is undecided, present comparative options with trade-offs.
  • Highlight dependencies on platform teams or infrastructure before finalizing timeline.
  • Escalate when observability requirements conflict with compliance or privacy constraints.

Didn't find tool you were looking for?

Be as detailed as possible for better results