Agent skill

behavioral-evals

Guidance for creating, running, fixing, and promoting behavioral evaluations. Use when verifying agent decision logic, debugging failures, debugging prompt steering, or adding workspace regression tests.

View SKILL.md on GitHub Repository

Stars 99,679

Forks 12,762

Install this agent skill to your Project

npx add-skill https://github.com/google-gemini/gemini-cli/tree/main/.gemini/skills/behavioral-evals

SKILL.md

Behavioral Evals

Overview

Behavioral evaluations (evals) are tests that validate the agent's decision-making (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions.

[!NOTE] Single Source of Truth: For core concepts, policies, running tests, and general best practices, always refer to evals/README.md.

🔄 Workflow Decision Tree

Does a prompt/tool change need validation?
- No -> Normal integration tests.
- Yes -> Continue below.
Is it UI/Interaction heavy?
- Yes -> Use appEvalTest (AppRig). See creating.md.
- No -> Use evalTest (TestRig). See creating.md.
Is it a new test?
- Yes -> Set policy to USUALLY_PASSES.
- No -> ALWAYS_PASSES (locks in regression).
Are you fixing a failure or promoting a test?
- Fixing -> See fixing.md.
- Promoting -> See promoting.md.

📋 Quick Checklist

1. Setup Workspace

Seed the workspace with necessary files using the files object to simulate a realistic scenario (e.g., NodeJS project with package.json).

Details in creating.md

2. Write Assertions

Audit agent decisions using rig.setBreakpoint() (AppRig only) or index verification on rig.readToolLogs().

Details in creating.md

3. Verify

Run single tests locally with Vitest. Confirm stability locally before relying on CI workflows.

See evals/README.md for running commands.

📦 Bundled Resources

Detailed procedural guides:

creating.md: Assertion strategies, Rig selection, Mock MCPs.
fixing.md: Step-by-step automated investigation, architecture diagnosis guidelines.
promoting.md: Candidate identification criteria and threshold guidelines.

Maintainer

google-gemini Core maintainer

Source details

Full Name: google-gemini/gemini-cli
Branch: main
Path in repo: .gemini/skills/behavioral-evals
License: Apache License 2.0
Topics: ai cli mcp-client mcp-server ai-agents gemini gemini-api

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

google-gemini/gemini-cli

skill-creator

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Gemini CLI's capabilities with specialized knowledge, workflows, or tool integrations.

99,679 12,762

Explore

google-gemini/gemini-cli

pirate-skill

Speak like a pirate.

99,679 12,762

Explore

google-gemini/gemini-cli

greeter

A friendly greeter skill

99,679 12,762

Explore

google-gemini/gemini-cli

ci

A specialized skill for Gemini CLI that provides high-performance, fail-fast monitoring of GitHub Actions workflows and automated local verification of CI failures. It handles run discovery automatically—simply provide the branch name.

99,679 12,762

Explore

google-gemini/gemini-cli

pr-address-comments

Use this skill if the user asks you to help them address GitHub PR comments for their current branch of the Gemini CLI. Requires `gh` CLI tool.

99,679 12,762

Explore

google-gemini/gemini-cli

review-duplication

Use this skill during code reviews to proactively investigate the codebase for duplicated functionality, reinvented wheels, or failure to reuse existing project best practices and shared utilities.

99,679 12,762

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Behavioral Evals

Overview

🔄 Workflow Decision Tree

📋 Quick Checklist

1. Setup Workspace

2. Write Assertions

3. Verify

📦 Bundled Resources

Recommended Agent Skills

skill-creator

pirate-skill

greeter

ci

pr-address-comments

review-duplication