Agent skill
observatory
Self-improving flywheel that analyzes agent traces, surfaces improvement signals, and proposes targeted system enhancements.
Install this agent skill to your Project
npx add-skill https://github.com/juanandresgs/claude-ctrl/tree/main/skills/observatory
SKILL.md
/observatory — Self-Improvement Flywheel
Analyzes agent traces to surface recurring failure patterns, inefficiencies, and improvement opportunities. Proposes one concrete improvement at a time for user approval. Accepted improvements are tracked — rejected and deferred items enter a reassessment backlog.
Why it exists
Without systematic analysis, the system cannot learn from its own operation. The observatory makes agent failures and partial successes visible and actionable. Each accepted improvement makes traces richer, enabling better future analysis.
Subcommands
| Command | Description |
|---|---|
/observatory or /observatory run |
Full cycle: analyze → suggest → report → approve/defer/reject |
/observatory report |
Generate full assessment report (all signals, batches, backlog) |
/observatory status |
Show current state (pending, implemented count, acceptance rate) |
/observatory history |
Show recent action log from history.jsonl |
/observatory analyze-only |
Run analysis only, no suggestion |
/observatory backlog |
Show deferred items with reassessment status |
/observatory batch <label> |
Approve an entire batch of related signals |
Process
Step 1: Run the analysis
bash ~/.claude/skills/observatory/scripts/converge.sh [subcommand]
The script:
- Reads agent trace summaries from
traces/*/summary.md - Identifies recurring patterns: silent returns, partial completions, repeated errors
- Ranks signals by impact × feasibility into a comparison matrix
- Groups related signals into labeled batches
- Proposes the highest-priority signal as a concrete improvement
Step 2: Present the suggestion
Show the user:
- Signal ID and description
- Evidence (which traces triggered it, how often)
- Proposed improvement (specific file edit or process change)
- Expected impact
Step 3: Handle the user's decision
| Decision | Action |
|---|---|
| Accept | Record in state.json implemented[], open GitHub issue, implement if simple |
| Reject | Record in state.json rejected[] with reason |
| Defer | Record in state.json deferred[], surfaces again after 10+ new traces |
| Batch approve | Accept all signals in a labeled batch at once |
State Files
observatory/state.json (v3):
{
"version": 3,
"last_analysis_at": null,
"pending_suggestion": null,
"pending_title": null,
"pending_priority": null,
"implemented": [{"sug_id": "SUG-001", "signal_id": "SIG-...", "implemented_at": "..."}],
"rejected": [],
"deferred": []
}
observatory/history.jsonl: One JSON entry per action (accepted/rejected/deferred), with timestamp and suggestion details.
observatory/analysis-cache.json: Full analysis output from the last run. Includes signal list, comparison matrix, batch assignments, assessment report.
observatory/analysis-cache.prev.json: Previous run's cache (for comparison).
observatory/comparison-matrix.json: Signal ranking by impact × feasibility.
observatory/suggestions/: Per-batch assessment files.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
rewind
List and restore to a named checkpoint from the current session. Use when the agent has gone off track and you need to recover to a previous good state.
consume-content
Produce a faithful content-snapshot of any source material (article, report, PDF, advisory) with verbatim quotes, structural transparency, and labeled editorial.
deep-research
Multi-model deep research with comparative assessment (OpenAI + Perplexity + Gemini). Queries 3 deep research providers in parallel and produces a comparative synthesis.
context-preservation
Generate structured context summaries for session continuity across compaction
reckoning
Analyze a project's MASTER_PLAN.md to assess coherence, evolution trajectory, and intent alignment. Modes: default (full analysis), compare (delta between reckonings), operationalize (convert findings to actionable work via /decide), steer (strategic brainstorming grounded in findings).
decide
Generate an interactive decision configurator from research or plan analysis. Presents options as explorable cards with trade-offs, costs, and filtering. Integrates with Planner to collect DEC-ID decisions.
Didn't find tool you were looking for?