Agent skill
debug
Investigate stuck runs and execution failures by tracing Symphony and Codex logs with issue/session identifiers; use when runs stall, retry repeatedly, or fail unexpectedly.
Install this agent skill to your Project
npx add-skill https://github.com/ReinaMacCredy/maestro/tree/main/.codex/skills/maestro:symphony-setup/reference/codex-skills/debug
SKILL.md
Debug
Goals
- Find why a run is stuck, retrying, or failing.
- Correlate Linear issue identity to a Codex session quickly.
- Read the right logs in the right order to isolate root cause.
Log Sources
- Primary runtime log:
log/symphony.log- Default comes from
SymphonyElixir.LogFile(log/symphony.log). - Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
- Default comes from
- Rotated runtime logs:
log/symphony.log*- Check these when the relevant run is older.
Correlation Keys
issue_identifier: human ticket key (example:MT-625)issue_id: Linear UUID (stable internal ID)session_id: Codex thread-turn pair (<thread_id>-<turn_id>)
elixir/docs/logging.md requires these fields for issue/session lifecycle logs. Use
them as your join keys during debugging.
Quick Triage (Stuck Run)
- Confirm scheduler/worker symptoms for the ticket.
- Find recent lines for the ticket (
issue_identifierfirst). - Extract
session_idfrom matching lines. - Trace that
session_idacross start, stream, completion/failure, and stall handling logs. - Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.
Commands
# 1) Narrow by ticket key (fastest entry point)
rg -n "issue_identifier=MT-625" log/symphony.log*
# 2) If needed, narrow by Linear UUID
rg -n "issue_id=<linear-uuid>" log/symphony.log*
# 3) Pull session IDs seen for that ticket
rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u
# 4) Trace one session end-to-end
rg -n "session_id=<thread>-<turn>" log/symphony.log*
# 5) Focus on stuck/retry signals
rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*
Investigation Flow
- Locate the ticket slice:
- Search by
issue_identifier=<KEY>. - If noise is high, add
issue_id=<UUID>.
- Search by
- Establish timeline:
- Identify first
Codex session started ... session_id=.... - Follow with
Codex session completed,ended with error, or worker exit lines.
- Identify first
- Classify the problem:
- Stall loop:
Issue stalled ... restarting with backoff. - App-server startup:
Codex session failed .... - Turn execution failure:
turn_failed,turn_cancelled,turn_timeout, orended with error. - Worker crash:
Agent task exited ... reason=....
- Stall loop:
- Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
- Capture evidence:
- Save key log lines with timestamps,
issue_identifier,issue_id, andsession_id. - Record probable root cause and the exact failing stage.
- Save key log lines with timestamps,
Reading Codex Session Logs
In Symphony, Codex session diagnostics are emitted into log/symphony.log and
keyed by session_id. Read them as a lifecycle:
Codex session started ... session_id=...- Session stream/lifecycle events for the same
session_id - Terminal event:
Codex session completed ..., orCodex session ended with error ..., orIssue stalled ... restarting with backoff
For one specific session investigation, keep the trace narrow:
- Capture one
session_idfor the ticket. - Build a timestamped slice for only that session:
rg -n "session_id=<thread>-<turn>" log/symphony.log*
- Mark the exact failing stage:
- Startup failure before stream events (
Codex session failed ...). - Turn/runtime failure after stream events (
turn_*/ended with error). - Stall recovery (
Issue stalled ... restarting with backoff).
- Startup failure before stream events (
- Pair findings with
issue_identifierandissue_idfrom nearby lines to confirm you are not mixing concurrent retries.
Always pair session findings with issue_identifier/issue_id to avoid mixing
concurrent runs.
Notes
- Prefer
rgovergrepfor speed on large logs. - Check rotated logs (
log/symphony.log*) before concluding data is missing. - If required context fields are missing in new log statements, align with
elixir/docs/logging.mdconventions.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
maestro-skill-author
Create, update, or debug maestro built-in skills. Covers SKILL.md frontmatter, reference directory structure, step-file architecture, build-time embedding, naming conventions, alias management, and registry validation. Use when creating a new maestro built-in skill, modifying an existing SKILL.md, adding reference files, debugging skill loading failures, updating the skills registry, or working on the skills full port. Also use when frontmatter validation fails, skills don't appear in skill-list, or reference files fail to load.
maestro:brainstorming
Use before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
maestro:plan-review-loop
Deep-review any plan (maestro, Codex, Claude Code plan mode, or plain markdown) using iterative subagent review loops with BMAD-inspired adversarial edge-case discovery. Spawns reviewer subagents that find issues using pre-mortem, inversion, and red-team techniques, auto-fixes them with structured fix strategies, and re-reviews until the plan passes with zero actionable issues. Use when the user says 'review the plan', 'deep review', 'check the plan thoroughly', 'review loop', 'validate before approving', or wants rigorous plan validation before execution. Also use proactively before plan-approve when the plan is complex or high-risk.
maestro:research
Structured research workflow for maestro features. Guides tool selection across three tiers (codebase exploration, Context7 for library docs, NotebookLM for deep analysis), defines research patterns, finding organization via memory_write, and completion criteria. Use during the research pipeline stage after feature_create and before plan_write. Also use when investigating a problem space, comparing technical approaches, gathering context on unfamiliar code, or needing to understand external library APIs before making architectural decisions.
cli-for-agents
Designs or reviews CLIs so coding agents can run them reliably: non-interactive flags, layered --help with examples, stdin/pipelines, fast actionable errors, idempotency, dry-run, and predictable structure. Use when building a CLI, adding commands, writing --help, or when the user mentions agents, terminals, or automation-friendly CLIs.
Didn't find tool you were looking for?