Skill: domain-advisor

When to use

Use this skill when:

The user message starts with <advise>, or
The user asks "What should we try next?" or similar planning questions.

Initialization

Read .codex/skills/registry.json to determine:
- domain: research | unsloth | cuda
- paths.reports: where to find experiment/benchmark reports
- paths.experiment_log: path to experiment log
- paths.troubleshooting: path to troubleshooting guide
Adapt behavior based on domain (see Domain-Specific Behavior below).

Inputs

User goal (including constraints like run budget, hardware, time).
Optional references to files (configs, reports, logs) mentioned in the goal.

Behavior

Understand the goal
- Parse the research/development goal and constraints.
- If unclear, ask a concise clarifying question.
Gather context
- Scan .codex/skills/ for relevant result skills matching the task.
- Read relevant reports from the paths.reports directory.
- Skim recent entries in paths.experiment_log.
- If errors are mentioned, check paths.troubleshooting for known patterns.
Propose a plan
- Design 2–5 concrete experiments/tasks, each with:
  - Clear objective
  - Key configuration/parameters
  - Expected outcome or hypothesis
  - Any relevant variation to test
- Respect constraints (budget, hardware, time).
- Reuse defaults from existing skills instead of inventing new ones.
Output format
- Start with a short natural-language summary.
- Provide a markdown table:
  
  id description key_differences notes
- Optionally propose file paths for configs or reports to create.
- If errors were mentioned, explain how this plan avoids known failure patterns.
Logging
- When useful, append a short entry to paths.experiment_log summarizing the proposed plan (only with user approval).

Domain-Specific Behavior

Research Domain

When domain: research:

Focus areas:

Hyperparameter sweeps (learning rate, batch size, epochs)
Model architecture variations (layers, dimensions, attention heads)
Dataset mixtures and sampling strategies
Training dynamics (warmup, schedulers, checkpointing)

Context to gather:

training_reports/*.md - past training runs
Model configs and their performance
Loss curves and convergence patterns

Output emphasis:

Parameter sweep tables with specific values
Ablation study designs
Baseline comparisons

Unsloth Domain

When domain: unsloth:

Focus areas:

LoRA rank and alpha selection
Quantization settings (4-bit, 8-bit, nf4)
Gradient checkpointing configuration
Fine-tuning hyperparameters for specific model families
Memory optimization strategies

Context to gather:

training_reports/*.md - past fine-tuning runs
Model-specific Unsloth configurations
Memory usage patterns

Output emphasis:

LoRA configuration recommendations
Memory/speed tradeoffs
Model-specific settings (Llama, Mistral, Qwen, etc.)

CUDA Domain

When domain: cuda:

Focus areas:

Tiling strategies and block sizes
Shared memory usage patterns
Warp-level primitives
Memory coalescing optimization
Triton autotuning configurations

Context to gather:

benchmark_results/*.md - past kernel benchmarks
Profiling data (nsight, ncu reports)
Bandwidth and occupancy metrics

Output emphasis:

Kernel configuration parameters
Expected speedup estimates
Memory access pattern recommendations
Profiling metrics to track

Example Output

Research Example

markdown

## Experiment Plan: Attention Head Ablation

Based on previous runs in `training_reports/baseline-2025-01.md`, the 8-head
configuration achieved 92% accuracy. Testing whether fewer heads can match
this with lower compute.

| id | description | key_differences | notes |
|----|-------------|-----------------|-------|
| A1 | 4-head attention | heads=4 vs baseline 8 | Test if 4 heads sufficient |
| A2 | 6-head attention | heads=6 | Middle ground |
| A3 | 4-head + wider FFN | heads=4, ffn_dim=4096 | Compensate with FFN |

All runs use: lr=1e-4, batch_size=32, epochs=10 (from baseline config).

Unsloth Example

markdown

## Fine-tuning Plan: Llama-3 8B with Unsloth

Based on `training_reports/llama3-lora-v1.md`, rank=16 showed good results
but OOM'd at batch_size=4. Testing memory-efficient configurations.

| id | description | key_differences | notes |
|----|-------------|-----------------|-------|
| U1 | rank=8 + grad_ckpt | Lower rank, enable checkpointing | Memory baseline |
| U2 | rank=16 + 4bit | Full rank with 4-bit quantization | Quality vs memory |
| U3 | rank=32 + offload | Higher rank with CPU offload | Max quality attempt |

All runs use: alpha=32, dropout=0.05, target_modules=["q_proj", "v_proj"]

CUDA Example

markdown

## Kernel Optimization Plan: Softmax

Based on `benchmark_results/softmax-v1.md`, current implementation achieves
80% of theoretical bandwidth. Testing tiling strategies.

| id | description | key_differences | notes |
|----|-------------|-----------------|-------|
| K1 | 2D tiling | BLOCK_M=64, BLOCK_N=64 | Better L2 reuse |
| K2 | Warp reduction | Use warp shuffles | Reduce shared mem |
| K3 | Online softmax | Single-pass algorithm | Fused with attention |

Profile with: `ncu --set full` to capture memory metrics.

Search AI Tools

domain-advisor

Install this agent skill to your Project

Metadata

SKILL.md

Skill: domain-advisor

When to use

Initialization

Inputs

Behavior

Domain-Specific Behavior

Research Domain

Unsloth Domain

CUDA Domain

Example Output

Research Example

Unsloth Example

CUDA Example