Sponsored by

Find leads on Reddit on auto pilot

Agent skill

eval

Evaluate agent quality across three modes — without BK, BK grep-only, and BK full

View SKILL.md on GitHub Repository

Stars 11

Forks 3

Install this agent skill to your Project

npx add-skill https://github.com/blueraai/bluera-knowledge/tree/main/skills/eval

SKILL.md

Agent Quality Evaluation

Compare how well Claude answers library questions across three access levels:

Without BK — web search + training knowledge only
BK Grep — Grep/Read/Glob on cloned repos, no vector search
BK Full — vector search + get_full_context + Grep/Read

Arguments

Parse $ARGUMENTS:

No arguments: Show usage help
Quoted string: Run eval for that single question
--predefined: Run all predefined queries
--predefined N: Run predefined query #N only

Workflow

Prerequisites: Call execute with { command: "stores" } to list stores. Abort if none.
Resolve queries: Load from $CLAUDE_PLUGIN_ROOT/evals/agent-quality/queries/predefined.yaml or use arbitrary query.
Load templates: Read agent prompts + judge rubric from $CLAUDE_PLUGIN_ROOT/evals/agent-quality/templates/
Spawn 3 agents in parallel per query (replace {{QUESTION}}, {{STORES}}, {{STORE_PATHS}})
Judge: Score all 4 criteria (1-5): Accuracy, Specificity, Completeness, Source Grounding

Detailed procedures: references/procedures.md

Output format: references/output-format.md

Maintainer

blueraai Core maintainer

Source details

Full Name: blueraai/bluera-knowledge
Branch: main
Path in repo: skills/eval
License: MIT License
Topics: anthropic mcp typescript ai-agents developer-tools rag semantic-search vector-database knowledge-management documentation cli-tool code-search claue-code embedding-model offline-first retrieval

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

blueraai/bluera-knowledge

when-to-query

When to use BK vs Grep/Read for current project

blueraai/bluera-knowledge

sync

Sync stores from definitions config (bootstrap on fresh clone)

blueraai/bluera-knowledge

ui

Launch the admin web UI to browse stores, search, and manage knowledge

blueraai/bluera-knowledge

stores

List all indexed library stores

blueraai/bluera-knowledge

index

Re-index a knowledge store

blueraai/bluera-knowledge

test-plugin

Run comprehensive plugin validation test suite

Didn't find tool you were looking for?