Agent skill
swarm-local-e2e
Guide for running local E2E tests with API server, Docker lead/worker containers, task creation, log verification, UI dashboard, and cleanup
Install this agent skill to your Project
npx add-skill https://github.com/desplega-ai/agent-swarm/tree/main/.claude/skills/swarm-local-e2e
SKILL.md
Local E2E Testing Guide
Run full end-to-end tests of the agent swarm locally with a real API server and Docker containers.
When to Use This Skill
This skill should be invoked in two modes:
-
User-requested QA: The user asks you to run E2E tests, verify a feature, or QA a specific flow. Follow the steps below targeting what they asked for.
-
Automated change verification: After implementing changes that touch the API, runner, polling, task lifecycle, session logs, Docker entrypoint, or worker/lead behavior — use this skill proactively to verify the changes work end-to-end. Determine what's testable based on the diff:
- Task lifecycle changes (poll, runner, store-progress): Create assigned + pool tasks, verify they complete and have correct logs
- Session log changes: Run two sequential tasks on the same agent, verify log isolation (unique sessionIds, no cross-contamination)
- Docker / entrypoint changes: Build image, start containers, verify boot logs and registration
- UI changes: Start the dashboard, use agent-browser/qa-use to verify rendering
- API endpoint changes: Call the endpoint directly and verify the response
You do not need to run every step — pick the subset relevant to the changes being tested.
Prerequisites
- OrbStack or Docker Desktop running (
open -a OrbStackif needed) .envwithAPI_KEYandPORTconfigured.env.docker-leadwith lead config (AGENT_ID,CLAUDE_CODE_OAUTH_TOKEN,MCP_BASE_URL).env.dockerwith worker config (AGENT_ID,CLAUDE_CODE_OAUTH_TOKENorOPENROUTER_API_KEY,MCP_BASE_URL)
Step 1: Determine Your Port
Check .env for the configured port — do not assume 3013:
grep ^PORT= .env
Use this value as $PORT throughout. In worktrees, each worktree may have a different port. Always verify and use the value from .env.
Also verify the Docker env files match:
grep MCP_BASE_URL .env.docker-lead .env.docker
# Both should point to http://host.docker.internal:$PORT
If they don't match, update them before starting containers.
Step 2: Clean DB + Start API Server
# Kill any existing API process on your port
lsof -ti :$PORT | xargs kill 2>/dev/null
# Clean DB for fresh state
rm -f agent-swarm-db.sqlite agent-swarm-db.sqlite-wal agent-swarm-db.sqlite-shm
# Start API server
bun run start:http &
# Wait ~3s for startup, confirm "MCP HTTP server running on http://localhost:$PORT/mcp"
Step 3: Build Docker Image
bun run docker:build:worker
This builds agent-swarm-worker:latest from the current code. Rebuild after every code change.
Step 4: Start Lead Container
Use a unique container name to avoid conflicts with other worktrees (e.g. include branch name or feature):
docker run --rm -d \
--name e2e-lead-$(git branch --show-current | tr '/' '-') \
--env-file .env.docker-lead \
-e AGENT_ROLE=lead \
-e MAX_CONCURRENT_TASKS=1 \
-p 3201:3000 \
agent-swarm-worker:latest
Wait ~15s, then verify:
docker logs e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[lead] Polling for triggers (0/1 active)..."
If port 3201 is taken by another worktree, pick a different host port (e.g. -p 3211:3000).
Step 5: Start Worker Container
docker run --rm -d \
--name e2e-worker-$(git branch --show-current | tr '/' '-') \
--env-file .env.docker \
-e MAX_CONCURRENT_TASKS=1 \
-p 3203:3000 \
agent-swarm-worker:latest
Wait ~15s, then verify:
docker logs e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[worker] Polling for triggers (0/1 active)..."
Step 6: Verify Registration
Use context-mode execute (not curl directly due to hook restrictions):
const headers = { 'Authorization': 'Bearer $API_KEY', 'Content-Type': 'application/json' };
const agents = await (await fetch('http://localhost:$PORT/api/agents', { headers })).json();
for (const a of agents.agents) {
console.log(`${a.name} | isLead: ${a.isLead} | status: ${a.status} | id: ${a.id}`);
}
Should show both lead and worker registered as idle. Save the agent IDs for task creation.
Step 7: Create Tasks
Assigned task (picked up by lead)
const t = await (await fetch('http://localhost:$PORT/api/tasks', {
method: 'POST', headers,
body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.', agentId: LEAD_ID })
})).json();
console.log('Task:', t.id, '| status:', t.status);
Important: Use agentId (not assignedTo) to assign tasks. Wrong param silently creates an unassigned task.
Pool task (auto-claimed by worker)
const t = await (await fetch('http://localhost:$PORT/api/tasks', {
method: 'POST', headers,
body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.' })
})).json();
console.log('Pool task:', t.id, '| status:', t.status);
Workers auto-claim unassigned tasks at poll time. Leads do not auto-claim pool tasks.
Step 8: Monitor Progress
# Watch lead logs (use your container name)
docker logs -f e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20
# Watch worker logs
docker logs -f e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20
Poll task status:
const t = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>', { headers })).json();
console.log(t.status); // pending → in_progress → completed/failed
Step 9: Verify Session Logs
const logs = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>/session-logs', { headers })).json();
console.log('Log count:', logs.logs.length);
// Should be > 0 for completed tasks
For log isolation verification (multiple sequential tasks from same agent):
const [l1, l2] = await Promise.all([
fetch('http://localhost:$PORT/api/tasks/<task1>/session-logs', { headers }).then(r => r.json()),
fetch('http://localhost:$PORT/api/tasks/<task2>/session-logs', { headers }).then(r => r.json()),
]);
const s1 = [...new Set(l1.logs.map(l => l.sessionId))];
const s2 = [...new Set(l2.logs.map(l => l.sessionId))];
console.log('Unique sessionIds:', s1[0] !== s2[0]); // Should be true
Step 10: Test the Dashboard UI
Start the dashboard to visually verify tasks, logs, and agent status:
cd new-ui && pnpm run dev &
# Defaults to port from APP_URL in .env (check with: grep APP_URL ../.env)
If the UI port is taken by another worktree, start on an alternate:
cd new-ui && pnpm run dev --port 5276
The UI connects to the API via VITE_API_URL (check new-ui/.env or defaults to http://localhost:$PORT).
Visual verification with agent-browser / qa-use
Use agent-browser or qa-use to automate UI checks:
# Quick visual gut-check with agent-browser
agent-browser --url http://localhost:5175 snapshot
# Or use qa-use to verify specific flows
qa-use explore http://localhost:5175
Things to verify in the UI:
- Agents page: Lead and worker both show as registered with correct status
- Tasks page: Tasks appear with correct status, assigned agent, and timestamps
- Task detail → Logs tab: Session logs render in the conversation viewer (not "No session data available")
- Task detail → Outcome tab: Completed tasks show output
- Costs: Session costs appear for completed tasks
Step 11: Cleanup
# Stop containers (use your branch-specific names)
docker stop e2e-lead-$(git branch --show-current | tr '/' '-') e2e-worker-$(git branch --show-current | tr '/' '-') 2>/dev/null
# Stop API server
lsof -ti :$PORT | xargs kill 2>/dev/null
# Stop UI dev server (if started)
lsof -ti :5175 | xargs kill 2>/dev/null
Troubleshooting
Docker daemon not running
ERROR: Cannot connect to the Docker daemon
Fix: open -a OrbStack and wait ~5s.
Container name conflict
docker: Error response from daemon: Conflict. The container name "..." is already in use
Another worktree has a container with the same name. Either stop it (docker stop <name>) or use a different name suffix.
Lead not picking up tasks
- Verify task was created with
agentId(notassignedTo) — wrong param silently creates an unassigned task - Check task status isn't already
in_progress(e.g. from a manual poll call that consumed the trigger) - Restart container if stuck:
docker restart <container-name>
Worker not picking up pool tasks
- Workers auto-claim via poll. Leads do not claim pool tasks.
- Check worker has capacity:
docker logs <container> 2>&1 | grep "capacity" - If "At capacity" — a previous task is still running. Wait or restart.
Poll returns 404
- Poll endpoint is GET
/api/poll(not POST) - Requires
X-Agent-IDheader with a valid agent UUID
Port conflicts (worktrees)
lsof -i :3013 # Check what's using the port
If another worktree is running, set a different PORT in .env and update MCP_BASE_URL in .env.docker* to http://host.docker.internal:<new-port>.
Session logs show 0 entries
- Task must have actually run (status
completedorfailed, not justin_progress) - Check
claudeSessionIdis set on the task:GET /api/tasks/<id>should show it - If logs were stored under wrong taskId, check the
session_logstable directly
Task cancellation doesn't stop Claude
Direct API cancellation (POST /api/tasks/<id>/cancel) updates the DB but doesn't kill the Claude process inside Docker. Use docker restart <container> to force-stop.
Keep tasks trivial
Use simple tasks like "Say hello" for E2E tests. Complex tasks waste time and API credits.
UI shows stale data
The dashboard auto-polls every 5 seconds. If data looks stale, hard-refresh (Cmd+Shift+R) or check VITE_API_URL points to the correct API port.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
implement-issue
Implement a GitHub issue or GitLab issue and create a PR/MR
start-leader
Start the Agent Swarm Leader
investigate-sentry-issue
Investigate and triage a Sentry error issue
user-management
How to manage the user registry — creating users for new Slack/GitHub/GitLab identities, managing aliases, resolving users across platforms. Use when a new human interacts with the swarm or when user identity needs updating.
close-issue
Close a GitHub or GitLab issue with a summary comment
swarm-chat
Effective communication within the agent swarm using internal Slack
Didn't find tool you were looking for?