Agent skill

railway-metrics

Query resource usage metrics for Railway services. Use when user asks about resource usage, CPU, memory, network, disk, or service performance like "how much memory is my service using" or "is my service slow".

Stars 23,776
Forks 2,298

Install this agent skill to your Project

npx add-skill https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/railway/metrics

SKILL.md

Railway Service Metrics

Query resource usage metrics for Railway services.

When to Use

  • User asks "how much memory is my service using?"
  • User asks about CPU usage, network traffic, disk usage
  • User wants to debug performance issues
  • User asks "is my service healthy?" (combine with railway-service skill)

Prerequisites

Get environmentId and serviceId from linked project:

bash
railway status --json

Extract:

  • environment.id → environmentId
  • service.id → serviceId (optional - omit to get all services)

MetricMeasurement Values

Measurement Description
CPU_USAGE CPU usage (cores)
CPU_LIMIT CPU limit (cores)
MEMORY_USAGE_GB Memory usage in GB
MEMORY_LIMIT_GB Memory limit in GB
NETWORK_RX_GB Network received in GB
NETWORK_TX_GB Network transmitted in GB
DISK_USAGE_GB Disk usage in GB
EPHEMERAL_DISK_USAGE_GB Ephemeral disk usage in GB
BACKUP_USAGE_GB Backup usage in GB

MetricTag Values (for groupBy)

Tag Description
DEPLOYMENT_ID Group by deployment
DEPLOYMENT_INSTANCE_ID Group by instance
REGION Group by region
SERVICE_ID Group by service

Query

graphql
query metrics(
  $environmentId: String!
  $serviceId: String
  $startDate: DateTime!
  $endDate: DateTime
  $sampleRateSeconds: Int
  $averagingWindowSeconds: Int
  $groupBy: [MetricTag!]
  $measurements: [MetricMeasurement!]!
) {
  metrics(
    environmentId: $environmentId
    serviceId: $serviceId
    startDate: $startDate
    endDate: $endDate
    sampleRateSeconds: $sampleRateSeconds
    averagingWindowSeconds: $averagingWindowSeconds
    groupBy: $groupBy
    measurements: $measurements
  ) {
    measurement
    tags {
      deploymentInstanceId
      deploymentId
      serviceId
      region
    }
    values {
      ts
      value
    }
  }
}

Example: Last Hour CPU and Memory

Use heredoc to avoid shell escaping issues:

bash
bash <<'SCRIPT'
START_DATE=$(date -u -v-1H +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u -d "1 hour ago" +"%Y-%m-%dT%H:%M:%SZ")
ENV_ID="your-environment-id"
SERVICE_ID="your-service-id"

VARS=$(jq -n \
  --arg env "$ENV_ID" \
  --arg svc "$SERVICE_ID" \
  --arg start "$START_DATE" \
  '{environmentId: $env, serviceId: $svc, startDate: $start, measurements: ["CPU_USAGE", "MEMORY_USAGE_GB"]}')

${CLAUDE_PLUGIN_ROOT}/skills/lib/railway-api.sh \
  'query metrics($environmentId: String!, $serviceId: String, $startDate: DateTime!, $measurements: [MetricMeasurement!]!) {
    metrics(environmentId: $environmentId, serviceId: $serviceId, startDate: $startDate, measurements: $measurements) {
      measurement
      tags { deploymentId region serviceId }
      values { ts value }
    }
  }' \
  "$VARS"
SCRIPT

Example: All Services in Environment

Omit serviceId and use groupBy to get metrics for all services:

bash
bash <<'SCRIPT'
START_DATE=$(date -u -v-1H +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u -d "1 hour ago" +"%Y-%m-%dT%H:%M:%SZ")
ENV_ID="your-environment-id"

VARS=$(jq -n \
  --arg env "$ENV_ID" \
  --arg start "$START_DATE" \
  '{environmentId: $env, startDate: $start, measurements: ["CPU_USAGE", "MEMORY_USAGE_GB"], groupBy: ["SERVICE_ID"]}')

${CLAUDE_PLUGIN_ROOT}/skills/lib/railway-api.sh \
  'query metrics($environmentId: String!, $startDate: DateTime!, $measurements: [MetricMeasurement!]!, $groupBy: [MetricTag!]) {
    metrics(environmentId: $environmentId, startDate: $startDate, measurements: $measurements, groupBy: $groupBy) {
      measurement
      tags { serviceId region }
      values { ts value }
    }
  }' \
  "$VARS"
SCRIPT

Time Parameters

Parameter Description
startDate Required. ISO 8601 format (e.g., 2024-01-01T00:00:00Z)
endDate Optional. Defaults to now
sampleRateSeconds Sample interval (e.g., 60 for 1-minute samples)
averagingWindowSeconds Averaging window for smoothing

Tip: For last hour, calculate startDate as now - 1 hour in ISO format.

Output Interpretation

json
{
  "data": {
    "metrics": [
      {
        "measurement": "CPU_USAGE",
        "tags": { "deploymentId": "...", "serviceId": "...", "region": "us-west1" },
        "values": [
          { "ts": "2024-01-01T00:00:00Z", "value": 0.25 },
          { "ts": "2024-01-01T00:01:00Z", "value": 0.30 }
        ]
      }
    ]
  }
}
  • ts - timestamp in ISO format
  • value - metric value (cores for CPU, GB for memory/disk/network)

Composability

  • Get IDs: Use railway-status skill or railway status --json
  • Check service health: Use railway-service skill for deployment status
  • View logs: Use railway-deployment skill if metrics show issues
  • Scale service: Use railway-environment skill to adjust resources

Error Handling

Empty/Null Metrics

Services without active deployments return empty metrics arrays. When processing with jq, handle nulls:

bash
# Safe iteration - skip nulls
jq -r '.data.metrics[]? | select(.values != null and (.values | length) > 0) | ...'

# Check if metrics exist before processing
jq -e '.data.metrics | length > 0' response.json && echo "has metrics"

No Metrics Data

Service may be new or have no traffic. Check:

  • Service has active deployment (stopped services have no metrics)
  • Time range includes deployment period

Invalid Service/Environment ID

Verify IDs with railway status --json.

Permission Denied

User needs access to the project to query metrics.

Expand your agent's capabilities with these related and highly-rated skills.

davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298
Explore
davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298
Explore
davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

23,776 2,298
Explore
davila7/claude-code-templates

Claude Code Guide

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

23,776 2,298
Explore
davila7/claude-code-templates

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

23,776 2,298
Explore
davila7/claude-code-templates

behavioral-modes

AI operational modes (brainstorm, implement, debug, review, teach, ship, orchestrate). Use to adapt behavior based on task type.

23,776 2,298
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results