Agent skill
monitoring-observability
Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing (as_type, score_current_span, should_export_span, LangfuseMedia), and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.
Install this agent skill to your Project
npx add-skill https://github.com/yonatangross/orchestkit/tree/main/src/skills/monitoring-observability
Metadata
Additional technical details for this skill
- category
- document-asset-creation
SKILL.md
Monitoring & Observability
Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.
Quick Reference
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Infrastructure Monitoring | 3 | CRITICAL | Prometheus metrics, Grafana dashboards, alerting rules |
| LLM Observability | 3 | HIGH | Langfuse tracing, cost tracking, evaluation scoring |
| Drift Detection | 3 | HIGH | Statistical drift, quality regression, drift alerting |
| Silent Failures | 3 | HIGH | Tool skipping, quality degradation, loop/token spike alerting |
Total: 12 rules across 4 categories
Quick Start
# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram
http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])
# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client
@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
get_client().update_current_trace(
user_id="user_123", session_id="session_abc",
tags=["production", "orchestkit"],
)
result = await llm.generate(content)
get_client().score_current_span(name="response_quality", value=0.85)
return result
# PSI drift detection
import numpy as np
psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
alert("Significant quality drift detected!")
Infrastructure Monitoring
Prometheus metrics, Grafana dashboards, and alerting for application health.
| Rule | File | Key Pattern |
|---|---|---|
| Prometheus Metrics | rules/monitoring-prometheus.md |
RED method, counters, histograms, cardinality |
| Grafana Dashboards | rules/monitoring-grafana.md |
Golden Signals, SLO/SLI, health checks |
| Alerting Rules | rules/monitoring-alerting.md |
Severity levels, grouping, escalation, fatigue prevention |
LLM Observability
Langfuse-based tracing, cost tracking, and evaluation for LLM applications.
| Rule | File | Key Pattern |
|---|---|---|
| Langfuse Traces | rules/llm-langfuse-traces.md |
@observe decorator, OTEL spans, agent graphs |
| Cost Tracking | rules/llm-cost-tracking.md |
Token usage, spend alerts, Metrics API v2 |
| Eval Scoring | rules/llm-eval-scoring.md |
Custom scores, evaluator tracing, quality monitoring |
Drift Detection
Statistical and quality drift detection for production LLM systems.
| Rule | File | Key Pattern |
|---|---|---|
| Statistical Drift | rules/drift-statistical.md |
PSI, KS test, KL divergence, EWMA |
| Quality Drift | rules/drift-quality.md |
Score regression, baseline comparison, canary prompts |
| Drift Alerting | rules/drift-alerting.md |
Dynamic thresholds, correlation, anti-patterns |
Silent Failures
Detection and alerting for silent failures in LLM agents.
| Rule | File | Key Pattern |
|---|---|---|
| Tool Skipping | rules/silent-tool-skipping.md |
Expected vs actual tool calls, Langfuse traces |
| Quality Degradation | rules/silent-degraded-quality.md |
Heuristics + LLM-as-judge, z-score baselines |
| Silent Alerting | rules/silent-alerting.md |
Loop detection, token spikes, escalation workflow |
Key Decisions
| Decision | Recommendation | Rationale |
|---|---|---|
| Metric methodology | RED method (Rate, Errors, Duration) | Industry standard, covers essential service health |
| Log format | Structured JSON | Machine-parseable, supports log aggregation |
| Tracing | OpenTelemetry | Vendor-neutral, auto-instrumentation, broad ecosystem |
| LLM observability | Langfuse (not LangSmith) | Open-source, self-hosted, built-in prompt management |
| LLM tracing API | @observe(as_type=...) + score_current_span() |
v4: semantic types, inline scoring, span filtering |
| Langfuse APIs | Observations API v2 + Metrics API v2 | v4 (Mar 2026): faster querying, aggregations at scale |
| Drift method | PSI for production, KS for small samples | PSI is stable for large datasets, KS more sensitive |
| Threshold strategy | Dynamic (95th percentile) over static | Reduces alert fatigue, context-aware |
| Alert severity | 4 levels (Critical, High, Medium, Low) | Clear escalation paths, appropriate response times |
Detailed Documentation
| Resource | Description |
|---|---|
${CLAUDE_SKILL_DIR}/references/ |
Logging, metrics, tracing, Langfuse, drift analysis guides |
${CLAUDE_SKILL_DIR}/checklists/ |
Implementation checklists for monitoring and Langfuse setup |
${CLAUDE_SKILL_DIR}/examples/ |
Real-world monitoring dashboard and trace examples |
${CLAUDE_SKILL_DIR}/scripts/ |
Templates: Prometheus, OpenTelemetry, health checks, Langfuse |
Related Skills
defense-in-depth- Layer 8 observability as part of security architecturedevops-deployment- Observability integration with CI/CD and Kubernetesresilience-patterns- Monitoring circuit breakers and failure scenariosllm-evaluation- Evaluation patterns that integrate with Langfuse scoringcaching- Caching strategies that reduce costs tracked by Langfuse
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
expect
Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.
github-operations
GitHub CLI operations for issues, PRs, milestones, and Projects v2. Covers gh commands, REST API patterns, and automation scripts. Use when managing GitHub issues, PRs, milestones, or Projects with gh.
chain-patterns
Chain patterns for CC 2.1.71 pipelines — MCP detection, handoff files, checkpoint-resume, worktree agents, CronCreate monitoring. Use when building multi-phase pipeline skills. Loaded via skills: field by pipeline skills (fix-issue, implement, brainstorm, verify). Not user-invocable.
storybook-mcp-integration
Storybook MCP server integration for component-aware AI development. Covers 6 tools across 3 toolsets (dev, docs, testing): component discovery via list-all-documentation/get-documentation, story previews via preview-stories, and automated testing via run-story-tests. Use when generating components that should reuse existing Storybook components, running component tests via MCP, or previewing stories in chat.
component-search
Search 21st.dev component registry for production-ready React components. Finds components by natural language description, filters by framework and style system, returns ranked results with install instructions. Use when looking for UI components, finding alternatives to existing components, or sourcing design system building blocks.
ai-ui-generation
AI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.
Didn't find tool you were looking for?