Agent skill
ai-debug
Diagnose why an AI feature is underperforming, hallucinating, or behaving inconsistently. Uses 4D audit to work backwards from symptoms to root cause.
Install this agent skill to your Project
npx add-skill https://github.com/breethomas/bette-think/tree/main/plugins/bette-think/skills/ai-debug
SKILL.md
AI Debug
Figure out why an existing AI feature is broken.
Works with:
- Linear MCP - Pull issue/bug details
- Manual - Describe the symptoms
Entry Point
When this skill is invoked, start with:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AI DEBUG
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
When AI fails, teams blame the model.
But 90% of failures are context failures.
What's going wrong?
1. Provide a Linear issue ID
2. Describe the symptoms
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Usage
/ai-debug # Describe symptoms manually
/ai-debug LIN-123 # Start from Linear bug/issue
What It Does
Works backwards from symptoms to root cause using the 4D audit:
| Symptom | Likely Root Cause | Focus Area |
|---|---|---|
| Hallucinations | Missing domain context, no grounding | D2, D4 |
| Inconsistency | Vague job definition, missing rules | D1, D4 |
| Generic outputs | Missing user/environment context | D2 |
| Wrong tone/format | Missing constraints, no examples | D1, D4 |
| Slow responses | Too much context, bad discovery | D2, D3 |
| High costs | Dumping everything in prompt | D2, D3 |
| Demo vs prod mismatch | Discovery strategy broken | D3, D4 |
Key insight: When AI fails, teams blame the model. But 90% of failures are context failures.
The 4D Audit
D1: Was the Job Defined?
- Can you articulate exactly what the model should produce?
- Is there a written spec for inputs, outputs, constraints?
- Do engineers and PMs agree on what "good" looks like?
D2: Is Context Right?
- What context is the model actually receiving?
- Walk through the 6 layers: Intent, User, Domain, Rules, Environment, Exposition
- Is context structured or dumped as raw text?
- Is there too much context (token bloat)?
D3: Is Context Fetched Reliably?
- How is each piece of context being fetched at runtime?
- What happens when a data source is unavailable?
- Is there visibility into what context is used per request?
D4: Are Failures Being Caught?
- Are there pre-checks before calling the model?
- Are there post-checks validating output?
- What's the fallback UX when things break?
- Is there a feedback loop capturing failures?
Output
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CONTEXT AUDIT COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Feature: [Name]
Symptoms: [What was reported]
D1 Demand: [CLEAR / GAP / CRITICAL]
D2 Data: [CLEAR / GAP / CRITICAL]
D3 Discovery: [CLEAR / GAP / CRITICAL]
D4 Defense: [CLEAR / GAP / CRITICAL]
Primary Issue: [Root cause summary]
RECOMMENDED FIXES (prioritized):
1. [Highest impact fix]
2. [Second fix]
3. [Third fix]
Quick Win: [Smallest change that would help]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Workflow
- Collect symptoms (what's going wrong)
- Map symptoms to likely causes using the table above
- Audit each D dimension with diagnostic questions
- Identify root cause and prioritize fixes
- Offer to add findings to Linear or export
Questions to ask at each step:
- "What specific behavior are you seeing?"
- "What should it be doing instead?"
- "When did this start happening?"
- "Does it happen every time or intermittently?"
Framework: 4D Context Canvas (Aakash Gupta & Miqdad Jaffer) Best for: Debugging hallucinations, inconsistency, performance issues in AI features
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
four-fits
Find which fit is broken before you burn cash scaling. Brian Balfour's framework for validating sustainable growth readiness.
project-health
Deep-dive health check on a single Linear project. Produces assessment with 7 dimensions - On Track / At Risk / Stalled.
prompt-engineering
Expert prompt optimization system for building production-ready AI features. Use when users request help improving prompts, want to create system prompts, need prompt review/critique, ask for prompt optimization strategies, want to analyze prompt effectiveness, mention prompt engineering best practices, request prompt templates, or need guidance on structuring AI instructions. Also use when users provide prompts and want suggestions for improvement.
strategy-session
Your product soundboard. Work through product decisions conversationally - Claude gathers context, challenges assumptions, captures decisions, and creates Linear issues.
shape-up
Shape work using the Shape Up methodology (Ryan Singer, Basecamp). Walk through the 4-step shaping process to create pitches ready for betting. Distinguishes between established product mode (fixed time, variable scope) and new product mode (looser constraints). Use when planning cycle work, writing pitches, or coaching PMs on shaping.
competitive-research
Systematic competitive intelligence with parallel agent analysis. Analyzes competitors thoroughly and synthesizes into actionable insights.
Didn't find tool you were looking for?