Agent skill

monitoring-observability

Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing (as_type, score_current_span, should_export_span, LangfuseMedia), and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.

Stars 143
Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/yonatangross/orchestkit/tree/main/plugins/ork/skills/monitoring-observability

Metadata

Additional technical details for this skill

category
document-asset-creation

SKILL.md

Monitoring & Observability

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category Rules Impact When to Use
Infrastructure Monitoring 3 CRITICAL Prometheus metrics, Grafana dashboards, alerting rules
LLM Observability 3 HIGH Langfuse tracing, cost tracking, evaluation scoring
Drift Detection 3 HIGH Statistical drift, quality regression, drift alerting
Silent Failures 3 HIGH Tool skipping, quality degradation, loop/token spike alerting

Total: 12 rules across 4 categories

Quick Start

python
# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
    buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])
python
# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client

@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
    get_client().update_current_trace(
        user_id="user_123", session_id="session_abc",
        tags=["production", "orchestkit"],
    )
    result = await llm.generate(content)
    get_client().score_current_span(name="response_quality", value=0.85)
    return result
python
# PSI drift detection
import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
    alert("Significant quality drift detected!")

Infrastructure Monitoring

Prometheus metrics, Grafana dashboards, and alerting for application health.

Rule File Key Pattern
Prometheus Metrics rules/monitoring-prometheus.md RED method, counters, histograms, cardinality
Grafana Dashboards rules/monitoring-grafana.md Golden Signals, SLO/SLI, health checks
Alerting Rules rules/monitoring-alerting.md Severity levels, grouping, escalation, fatigue prevention

LLM Observability

Langfuse-based tracing, cost tracking, and evaluation for LLM applications.

Rule File Key Pattern
Langfuse Traces rules/llm-langfuse-traces.md @observe decorator, OTEL spans, agent graphs
Cost Tracking rules/llm-cost-tracking.md Token usage, spend alerts, Metrics API v2
Eval Scoring rules/llm-eval-scoring.md Custom scores, evaluator tracing, quality monitoring

Drift Detection

Statistical and quality drift detection for production LLM systems.

Rule File Key Pattern
Statistical Drift rules/drift-statistical.md PSI, KS test, KL divergence, EWMA
Quality Drift rules/drift-quality.md Score regression, baseline comparison, canary prompts
Drift Alerting rules/drift-alerting.md Dynamic thresholds, correlation, anti-patterns

Silent Failures

Detection and alerting for silent failures in LLM agents.

Rule File Key Pattern
Tool Skipping rules/silent-tool-skipping.md Expected vs actual tool calls, Langfuse traces
Quality Degradation rules/silent-degraded-quality.md Heuristics + LLM-as-judge, z-score baselines
Silent Alerting rules/silent-alerting.md Loop detection, token spikes, escalation workflow

Key Decisions

Decision Recommendation Rationale
Metric methodology RED method (Rate, Errors, Duration) Industry standard, covers essential service health
Log format Structured JSON Machine-parseable, supports log aggregation
Tracing OpenTelemetry Vendor-neutral, auto-instrumentation, broad ecosystem
LLM observability Langfuse (not LangSmith) Open-source, self-hosted, built-in prompt management
LLM tracing API @observe(as_type=...) + score_current_span() v4: semantic types, inline scoring, span filtering
Langfuse APIs Observations API v2 + Metrics API v2 v4 (Mar 2026): faster querying, aggregations at scale
Drift method PSI for production, KS for small samples PSI is stable for large datasets, KS more sensitive
Threshold strategy Dynamic (95th percentile) over static Reduces alert fatigue, context-aware
Alert severity 4 levels (Critical, High, Medium, Low) Clear escalation paths, appropriate response times

Detailed Documentation

Resource Description
${CLAUDE_SKILL_DIR}/references/ Logging, metrics, tracing, Langfuse, drift analysis guides
${CLAUDE_SKILL_DIR}/checklists/ Implementation checklists for monitoring and Langfuse setup
${CLAUDE_SKILL_DIR}/examples/ Real-world monitoring dashboard and trace examples
${CLAUDE_SKILL_DIR}/scripts/ Templates: Prometheus, OpenTelemetry, health checks, Langfuse

Related Skills

  • defense-in-depth - Layer 8 observability as part of security architecture
  • devops-deployment - Observability integration with CI/CD and Kubernetes
  • resilience-patterns - Monitoring circuit breakers and failure scenarios
  • llm-evaluation - Evaluation patterns that integrate with Langfuse scoring
  • caching - Caching strategies that reduce costs tracked by Langfuse

Expand your agent's capabilities with these related and highly-rated skills.

yonatangross/orchestkit

expect

Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.

143 15
Explore
yonatangross/orchestkit

github-operations

GitHub CLI operations for issues, PRs, milestones, and Projects v2. Covers gh commands, REST API patterns, and automation scripts. Use when managing GitHub issues, PRs, milestones, or Projects with gh.

143 15
Explore
yonatangross/orchestkit

chain-patterns

Chain patterns for CC 2.1.71 pipelines — MCP detection, handoff files, checkpoint-resume, worktree agents, CronCreate monitoring. Use when building multi-phase pipeline skills. Loaded via skills: field by pipeline skills (fix-issue, implement, brainstorm, verify). Not user-invocable.

143 15
Explore
yonatangross/orchestkit

storybook-mcp-integration

Storybook MCP server integration for component-aware AI development. Covers 6 tools across 3 toolsets (dev, docs, testing): component discovery via list-all-documentation/get-documentation, story previews via preview-stories, and automated testing via run-story-tests. Use when generating components that should reuse existing Storybook components, running component tests via MCP, or previewing stories in chat.

143 15
Explore
yonatangross/orchestkit

component-search

Search 21st.dev component registry for production-ready React components. Finds components by natural language description, filters by framework and style system, returns ranked results with install instructions. Use when looking for UI components, finding alternatives to existing components, or sourcing design system building blocks.

143 15
Explore
yonatangross/orchestkit

ai-ui-generation

AI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.

143 15
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results