Agent skill

installation-orchestrator

Expert management of install.sh (2000+ lines). Use for installation troubleshooting, idempotency checks, secret generation, volume migration, 11 services startup order (including heuristics and semantic), and user onboarding.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/installation-orchestrator

SKILL.md

Installation Orchestrator (v2.0.0)

Overview

Expert management of install.sh (2000+ lines bash) including idempotency, secret generation, volume migration, 11-service orchestration with 3-branch detection startup, and troubleshooting installation failures.

When to Use This Skill

  • Troubleshooting installation failures
  • Managing install.sh modifications
  • Secret generation and validation
  • Volume migration between versions
  • Idempotency checks
  • User onboarding flow
  • 3-branch service startup order (v2.0.0)

v2.0.0 Architecture

11 Docker Services

yaml
Core Services:
  - clickhouse (data storage, port 8123)
  - grafana (monitoring, port 3001)
  - n8n (workflow engine, port 5678)

3-Branch Detection (v2.0.0):
  - heuristics-service (Branch A, port 5005, 30% weight)
  - semantic-service (Branch B, port 5006, 35% weight)
  - prompt-guard-api (Branch C, port 8000, 35% weight)

PII Detection:
  - presidio-pii-api (port 5001)
  - language-detector (port 5002)

Web Interface:
  - web-ui-backend (port 8787)
  - web-ui-frontend (via proxy)
  - proxy (Caddy, port 80)

Installation Flow

1. Pre-flight Checks

bash
- Docker installed and running
- Ports available (80, 5678, 8123, 3001, 8787, 5005, 5006, 8000)
- Disk space >10GB
- No existing .install-state.lock

2. Secret Generation

bash
CLICKHOUSE_PASSWORD=$(openssl rand -base64 32)
GF_SECURITY_ADMIN_PASSWORD=$(openssl rand -base64 32)
SESSION_SECRET=$(openssl rand -base64 64)
JWT_SECRET=$(openssl rand -base64 32)
WEB_UI_ADMIN_PASSWORD=$(openssl rand -base64 24)

3. Service Startup Order (v2.0.0)

yaml
Phase 1 - Data Layer:
  1. clickhouse (data storage)
  2. grafana (monitoring)

Phase 2 - Detection Core:
  3. n8n (workflow engine)
  4. heuristics-service (Branch A - fast pattern matching)
  5. semantic-service (Branch B - embedding analysis)
  6. prompt-guard-api (Branch C - LLM validation, optional)

Phase 3 - PII Services:
  7. presidio-pii-api (dual-language PII)
  8. language-detector (hybrid detection)

Phase 4 - Web Interface:
  9. web-ui-backend (Express API)
  10. web-ui-frontend (React app)
  11. proxy (Caddy reverse proxy)

4. Health Checks (v2.0.0)

bash
# Core services
for service in clickhouse grafana n8n web-ui; do
  wait_for_health $service 120s || fail
done

# 3-Branch detection services (v2.0.0)
wait_for_health heuristics-service 60s || warn "Branch A degraded"
wait_for_health semantic-service 90s || warn "Branch B degraded"
wait_for_health prompt-guard-api 120s || warn "Branch C degraded"

# PII services
wait_for_health presidio-pii-api 90s || warn "PII detection degraded"
wait_for_health language-detector 30s || warn "Language detection degraded"

5. Idempotency Lock

bash
touch .install-state.lock
echo "INSTALL_DATE=$(date)" >> .install-state.lock
echo "VERSION=2.0.0" >> .install-state.lock
echo "SERVICES=11" >> .install-state.lock

Common Tasks

Task 1: Fresh Installation

bash
./install.sh

# Prompts:
# 1. Generate secrets? [Y/n]
# 2. Set admin password (or auto-generate)
# 3. Delete existing vigil_data? [y/N]
# 4. Download Llama model? [Y/n] (for Branch C)

Task 2: Troubleshoot Failed Installation

bash
# Check state
cat .install-state.lock

# View logs
docker-compose logs --tail=100

# Check 3-branch services specifically (v2.0.0)
docker logs vigil-heuristics-service --tail 50
docker logs vigil-semantic-service --tail 50
docker logs vigil-prompt-guard-api --tail 50

# Retry specific service
docker-compose up -d heuristics-service
docker logs vigil-heuristics-service

# Clean slate
rm .install-state.lock .env vigil_data -rf
./install.sh

Task 3: Validate Environment

bash
./scripts/validate-env.sh

# Checks:
# - All required env vars present
# - Passwords meet requirements (min 8 chars)
# - Ports not in use (including 5005, 5006 for branches)
# - Docker network exists (vigil-net)
# - 11 services defined in docker-compose.yml

Task 4: Migrate Volumes (v1.x → v2.0.0)

bash
# Backup old data
docker run --rm -v vigil_clickhouse_data:/data -v $(pwd):/backup alpine \
  tar czf /backup/clickhouse-v1.x-$(date +%Y%m%d).tar.gz /data

# Run v2.0.0 migration SQL (adds branch columns)
docker exec vigil-clickhouse clickhouse-client < services/monitoring/sql/migrations/v2.0.0.sql

# Verify migration (branch columns added)
docker exec vigil-clickhouse clickhouse-client -q "
  DESCRIBE n8n_logs.events_processed
" | grep -E "branch_[abc]_score|arbiter_decision"

# Expected output:
# branch_a_score    Float32
# branch_b_score    Float32
# branch_c_score    Float32
# arbiter_decision  String

Task 5: Verify 3-Branch Services (v2.0.0)

bash
#!/bin/bash
# scripts/verify-branches.sh

echo "🔍 Verifying 3-Branch Detection Services..."

# Branch A: Heuristics
BRANCH_A=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:5005/health)
if [ "$BRANCH_A" == "200" ]; then
  echo "✅ Branch A (Heuristics): Healthy"
else
  echo "❌ Branch A (Heuristics): Down (HTTP $BRANCH_A)"
fi

# Branch B: Semantic
BRANCH_B=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:5006/health)
if [ "$BRANCH_B" == "200" ]; then
  echo "✅ Branch B (Semantic): Healthy"
else
  echo "❌ Branch B (Semantic): Down (HTTP $BRANCH_B)"
fi

# Branch C: LLM Guard
BRANCH_C=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/health)
if [ "$BRANCH_C" == "200" ]; then
  echo "✅ Branch C (LLM Guard): Healthy"
else
  echo "⚠️  Branch C (LLM Guard): Down (HTTP $BRANCH_C) - Optional"
fi

echo ""
echo "3-Branch Status: $([ "$BRANCH_A" == "200" ] && [ "$BRANCH_B" == "200" ] && echo "OPERATIONAL" || echo "DEGRADED")"

Troubleshooting

Issue: Port already in use

bash
# Check all v2.0.0 ports
for port in 80 5678 8123 3001 8787 5001 5002 5005 5006 8000; do
  lsof -i :$port && echo "Port $port in use"
done

# Kill specific process
kill -9 $(lsof -t -i:5005)

Issue: Branch service won't start

bash
# Check heuristics-service
docker logs vigil-heuristics-service --tail 100
# Common issue: missing patterns directory
# Fix: docker-compose build heuristics-service

# Check semantic-service
docker logs vigil-semantic-service --tail 100
# Common issue: model download failed
# Fix: docker exec vigil-semantic-service python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"

Issue: ClickHouse won't start

bash
# Check volume permissions
ls -la vigil_data/clickhouse/

# Reset volume
docker-compose down -v
docker volume rm vigil_clickhouse_data
./install.sh

Issue: Secrets not loaded

bash
# Verify .env file
cat .env | grep -E "(CLICKHOUSE|JWT|SESSION)_"

# Reload
docker-compose down
docker-compose up -d

Issue: Semantic service model download fails

bash
# Pre-download model (run before install)
docker run --rm -v vigil_semantic_models:/models python:3.11-slim bash -c "
  pip install sentence-transformers &&
  python -c \"from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2', cache_folder='/models')\"
"

# Restart semantic service
docker-compose restart semantic-service

Port Reference (v2.0.0)

Port Service Description
80 proxy Caddy reverse proxy (main entry)
3001 grafana Monitoring dashboard
5001 presidio-pii-api Dual-language PII detection
5002 language-detector Hybrid language detection
5005 heuristics-service Branch A (30% weight)
5006 semantic-service Branch B (35% weight)
5678 n8n Workflow engine
8000 prompt-guard-api Branch C (35% weight)
8123 clickhouse Analytics database
8787 web-ui-backend Configuration API

Quick Reference

bash
# Fresh install
./install.sh

# Status check (all 11 services)
./scripts/status.sh

# Verify 3-branch detection (v2.0.0)
./scripts/verify-branches.sh

# View logs
./scripts/logs.sh

# Restart
./scripts/restart.sh

# Uninstall
docker-compose down -v
rm -rf vigil_data .env .install-state.lock

Integration Points

With docker-vigil-orchestration:

yaml
when: Service won't start
action:
  1. Check vigil-net network connectivity
  2. Verify service dependencies
  3. Check port conflicts
  4. Review Docker resource limits

With clickhouse-grafana-monitoring:

yaml
when: Migration to v2.0.0
action:
  1. Run SQL migration script
  2. Verify branch columns exist
  3. Test ClickHouse queries
  4. Update Grafana dashboards

Last Updated: 2025-12-09 Install Script: 2000+ lines bash Services: 11 containers (v2.0.0) 3-Branch Ports: 5005 (Heuristics), 5006 (Semantic), 8000 (LLM Guard)

Version History

  • v2.0.0 (Current): 11 services, 3-branch detection startup, migration scripts
  • v1.6.11: 9 services, sequential detection

Didn't find tool you were looking for?

Be as detailed as possible for better results