Agent skills
regression-search

Agent skill

regression-search

Search phone-call history for when a feature regressed (find-regression.py) and drill into a single call to see what went wrong (diagnose-call.py). Skips reading 100+ transcripts by hand.

View SKILL.md on GitHub Repository

Stars 114

Forks 23

Install this agent skill to your Project

npx add-skill https://github.com/sonichi/sutando/tree/main/skills/regression-search

SKILL.md

Regression Search

Two scripts for hunting down bad calls without reading every transcript:

find-regression.py — search results/calls/calls.jsonl for calls touching a feature, classify each as working/broken, print a sorted timeline.
diagnose-call.py — drill into a single call by SID, report refusals/errors/silences/repeated requests, optionally show metrics from data/call-metrics.jsonl.

Closes #188.

When to use

"When did the X feature stop working?" — pass the feature keyword.
"Has feature Y improved?" — see the broken/working trend over time.
Before shipping a fix — sanity check that the regression is reproducible.

Usage

bash

python3 skills/regression-search/scripts/find-regression.py "record"
python3 skills/regression-search/scripts/find-regression.py "summon" --since 2026-04-01
python3 skills/regression-search/scripts/find-regression.py "play" --json

Flags:

--since YYYY-MM-DD — only show calls on/after this date
--json — machine-readable output
--show-snippet — print a one-line transcript snippet for each call

Heuristics

A call is broken for a query if any of:

Sutando refuses ("I can't", "I'm not able", "I'm unable", "sorry I cannot")
Sutando reports an error ("error", "failed", "didn't work", "something went wrong")
The user repeats the same request 2+ times in a row (Sutando didn't respond usefully)
Sutando says "(Silence)" after the user mentions the feature

Otherwise the call is working if Sutando's response includes the feature keyword and isn't flagged broken.

These are intentionally crude — the goal is "good enough to find the regression window without reading 163 transcripts." Tune as you find false positives.

Limitations

Keyword matching only. "recording doesn't stop" vs "recording won't start" both match record. The issue calls this out as future work.
No semantic understanding. A call where Sutando talks about recording but the user wanted something else still matches.
Doesn't correlate with git commits — manual step for now.

diagnose-call.py

bash

python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2
python3 skills/regression-search/scripts/diagnose-call.py CA701fc4129779... --metrics
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2 --json

Accepts a full SID or just the last 12 characters. Reports turn counts, refusals, errors, silences, repeated user requests, and the ending style (normal vs abrupt user end vs sutando silence). With --metrics, also pulls per-event tool-call timeline from data/call-metrics.jsonl (requires PR #223). Exit code 1 if any issues are found, 0 if clean — useful for CI.

Typical workflow: run find-regression.py to surface broken candidates, then diagnose-call.py <sid> to drill into the worst one.

Future work

Auto-correlate regression windows with git log
Smarter NLP-based query matching (query: "recording doesn't stop" vs "recording won't start")

Maintainer

sonichi Core maintainer

Source details

Full Name: sonichi/sutando
Branch: main
Path in repo: skills/regression-search
License: MIT License
Topics: claude automation self-hosted ai-agent gemini multi-agent macos open-source voice-assistant personal-ai voice-agent self-improving

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

sonichi/sutando

x-twitter

Post tweets, search, read mentions, and check engagement on X (Twitter) via API v2.

114 23

Explore

sonichi/sutando

schedule-crons

114 23

Explore

sonichi/sutando

claude-codex

Use the local Codex CLI from Claude Code with the user's existing Codex login or API key. Use for Codex reviews, second-opinion analysis, implementation delegation, or non-interactive Codex runs in the current workspace.

114 23

Explore

sonichi/sutando

phone-conversation

Make conversational phone calls and join Zoom meetings via Twilio + Gemini. Multi-turn AI conversations on the phone on behalf of the user.

114 23

Explore

sonichi/sutando

screen-record

114 23

Explore

sonichi/sutando

quota-tracker

Track Claude Code quota usage via Anthropic API rate limit headers. Shows 5h and 7d utilization, reset times, and quota status. Works with both subscription and API key auth.

114 23

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Regression Search

When to use

Usage

Heuristics

Limitations

diagnose-call.py

Future work

Recommended Agent Skills

x-twitter

schedule-crons

claude-codex

phone-conversation

screen-record

quota-tracker