Agent skill
observability
Logging, metrics, and distributed tracing. OpenTelemetry, Prometheus, Grafana, and production debugging.
Install this agent skill to your Project
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-backend/tree/main/skills/observability
SKILL.md
Observability Skill
Bonded to: backend-observability-agent (Primary)
Quick Start
# Invoke observability skill
"Set up structured logging for my application"
"Configure Prometheus metrics"
"Implement distributed tracing with OpenTelemetry"
Three Pillars
| Pillar | Purpose | Tools |
|---|---|---|
| Logs | What happened | ELK, Loki |
| Metrics | System state | Prometheus, Grafana |
| Traces | Request flow | Jaeger, OpenTelemetry |
Examples
Structured Logging
import structlog
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()
]
)
logger = structlog.get_logger()
def process_request(request):
log = logger.bind(
correlation_id=request.headers.get("X-Correlation-ID"),
user_id=request.user.id
)
log.info("request_started", method=request.method)
Prometheus Metrics
from prometheus_client import Counter, Histogram
REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency')
@REQUEST_LATENCY.time()
async def handle_request(request):
response = await process(request)
REQUEST_COUNT.labels(method=request.method, status=response.status_code).inc()
return response
OpenTelemetry Tracing
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
@tracer.start_as_current_span("process_order")
def process_order(order_id: str):
span = trace.get_current_span()
span.set_attribute("order.id", order_id)
with tracer.start_as_current_span("validate"):
validate(order_id)
with tracer.start_as_current_span("charge"):
charge(order_id)
Key Metrics (RED Method)
| Metric | Description | Alert Threshold |
|---|---|---|
| Rate | Requests/sec | Drop > 50% |
| Errors | Error rate % | > 1% for 5 min |
| Duration | P99 latency | > 500ms for 5 min |
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Missing logs | Log level too high | Adjust log level |
| High cardinality | Too many labels | Reduce label values |
| Broken traces | Context not propagated | Forward headers |
Resources
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
languages
Master programming languages for backend development. Learn language selection, fundamentals, and ecosystem for JavaScript, Python, Go, Java, C#, PHP, Ruby, and Rust.
api-design
Design and build professional APIs with REST, GraphQL, and gRPC. Master authentication, documentation, testing, and operational concerns.
architecture
Master architectural design with SOLID principles, design patterns, microservices, and event-driven systems. Learn to design scalable backend systems.
testing
Backend testing strategies and test automation. Unit, integration, E2E, and load testing with best practices.
devops
Deploy applications with Docker and Kubernetes, automate with CI/CD, manage infrastructure with code, and configure cloud platforms and networking.
databases
Master relational and NoSQL databases. Learn PostgreSQL, MySQL, MongoDB, Redis, and other technologies for data persistence, optimization, and scaling.
Didn't find tool you were looking for?