Agent skill
eval
Evaluate agent quality across three modes — without BK, BK grep-only, and BK full
Install this agent skill to your Project
npx add-skill https://github.com/blueraai/bluera-knowledge/tree/main/skills/eval
SKILL.md
Agent Quality Evaluation
Compare how well Claude answers library questions across three access levels:
- Without BK — web search + training knowledge only
- BK Grep — Grep/Read/Glob on cloned repos, no vector search
- BK Full — vector search + get_full_context + Grep/Read
Arguments
Parse $ARGUMENTS:
- No arguments: Show usage help
- Quoted string: Run eval for that single question
--predefined: Run all predefined queries--predefined N: Run predefined query #N only
Workflow
- Prerequisites: Call
executewith{ command: "stores" }to list stores. Abort if none. - Resolve queries: Load from
$CLAUDE_PLUGIN_ROOT/evals/agent-quality/queries/predefined.yamlor use arbitrary query. - Load templates: Read agent prompts + judge rubric from
$CLAUDE_PLUGIN_ROOT/evals/agent-quality/templates/ - Spawn 3 agents in parallel per query (replace
{{QUESTION}},{{STORES}},{{STORE_PATHS}}) - Judge: Score all 4 criteria (1-5): Accuracy, Specificity, Completeness, Source Grounding
Detailed procedures: references/procedures.md
Output format: references/output-format.md
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
when-to-query
When to use BK vs Grep/Read for current project
sync
Sync stores from definitions config (bootstrap on fresh clone)
ui
Launch the admin web UI to browse stores, search, and manage knowledge
stores
List all indexed library stores
index
Re-index a knowledge store
test-plugin
Run comprehensive plugin validation test suite
Didn't find tool you were looking for?