Agent skill
standards
Quality standards and Phase 4 review/publish/verify for cli-web-* CLIs. Covers implementation review (3 parallel agents), the 75-check quality checklist, package publishing (pip install -e .), and end-user smoke testing (READ + WRITE). TRIGGER when: "validate CLI", "publish CLI", "review CLI", "pip install -e .", "smoke test", "quality check", "start Phase 4", "75-check", "generate Claude skill", "check if implementation is complete", "verify implementation quality", or after testing skill completes. DO NOT trigger for: traffic capture, implementation, or test writing.
Install this agent skill to your Project
npx add-skill https://github.com/ItamarZand88/CLI-Anything-WEB/tree/main/cli-anything-web-plugin/skills/standards
SKILL.md
CLI-Anything-Web Standards (Phase 4: Review + Publish + Verify)
Quality gate for cli-web-* CLIs. This skill owns the complete Phase 4: independent implementation review, structural quality checklist, publishing, and end-user smoke testing. Nothing ships until this phase passes.
Prerequisites (Hard Gate)
Do NOT start unless:
- All tests pass (100% pass rate from Phase 3)
- TEST.md has both Part 1 (plan) and Part 2 (results)
- All core modules are implemented and functional
-
<APP>.md(API map) exists and documents all endpoints
If tests are not passing, invoke the testing skill first.
Site Profile Exceptions
Not all checks apply to every CLI. When evaluating, consider the site profile:
- No-auth sites (public APIs): Skip auth-related checks (auth.py required, auth commands, auth smoke test). Mark as N/A.
- Read-only sites (no write operations): Skip write operation smoke test. Verify reads return real data instead.
- API-key auth sites:
auth logintakes a key argument, not playwright-cli.auth refreshis not applicable — useauth logoutinstead.
Mark inapplicable checks as "N/A — [reason]" rather than creating dead-code stubs.
Step 1: Implementation Review (3 Parallel Agents)
Before checking structure or publishing, verify the code actually does the right thing. Tests prove it runs; this step proves it's correct.
Dispatch 3 plugin agents in the same message using the Agent tool:
traffic-fidelity-reviewer— API coverage (reads <APP>.md + client.py + commands/)harness-compliance-reviewer— Code conventions (reads HARNESS.md + all source)output-ux-reviewer— User experience (runs --help, checks REPL, validates JSON)
Pass each agent: APP_PATH={app}/agent-harness, APP_NAME={app}, and site
profile (auth_type, is_read_only). The agents are defined in the plugin's
agents/ directory.
| Agent | Focus | What it reads | What it catches |
|---|---|---|---|
| Traffic Fidelity | API coverage | <APP>.md + client.py + commands/ |
Missing endpoints, wrong params, broken response parsing, dead client methods, stale API map |
| HARNESS Compliance | Code quality | HARNESS.md + checklist + all source | click.ClickException bypass, missing to_dict(), retry_after lost, auth retry missing, stderr UTF-8 |
| Output & UX | User experience | --help output, --json output, REPL |
Protocol leaks, stale REPL help, dead command files, broken entry points |
Each agent scores findings on a 0-100 confidence scale. When all 3 return:
- Filter out findings with confidence < 75 (noise)
- Categorize remaining findings:
- Critical (90-100): Bugs, missing endpoints, data loss, auth broken
- Important (75-89): Wrong fields, incomplete parsing, missing options
- Minor (75, edge cases): Help text gaps, cosmetic issues
- Present the review report
- Fix all Critical issues before proceeding — re-run only the affected agent to verify the fix
- Fix Important issues (not strictly blocking but strongly recommended)
Gate: Do not proceed to Step 2 until Critical count = 0.
Step 2: Structural Quality Checklist (75 checks)
Run the automated checklist validator first to catch mechanical issues:
python ${CLAUDE_PLUGIN_ROOT}/scripts/validate-checklist.py \
<app>/agent-harness --app-name <app> --auth-type <auth-type>
This checks ~65 of the 75 items automatically (directory structure, required files, CLI patterns, packaging, code quality, REPL, error handling). Fix any FAIL results before proceeding.
For the remaining ~10 judgment-based checks (documentation quality, error message
guidance, fixture realism), review manually per references/quality-checklist.md.
Step 3: Create setup.py and Install
- Create
setup.pywith:find_namespace_packagesforcli_web.*console_scriptsentry point:cli-web-<app>- Dependencies:
click>=8.0,httpx - Optional:
extras_require={"browser": ["playwright>=1.40.0"]}
- Install:
pip install -e . - Verify:
which cli-web-<app> - Test help:
cli-web-<app> --help
Step 4: End-User Smoke Test (MANDATORY)
Run the automated smoke test first for quick validation:
python ${CLAUDE_PLUGIN_ROOT}/scripts/smoke-test.py cli-web-<app> --auth-type <auth-type>
This checks CLI binary resolution, --help, --version, auth status, and --json output for protocol leaks. Then proceed with manual verification below.
This is the most critical verification step. The agent MUST simulate what a real
end user would do after pip install cli-web-<app>. If this fails, the pipeline
is NOT complete -- go back and fix the issue.
If no-auth site: Skip steps 5-6 (auth). Go directly to step 7 (READ).
If read-only site: Skip step 8 (WRITE). Verify reads return real data.
5. Authenticate as an end user would:
cli-web-<app> auth login
This uses Python sync_playwright() -- opens a browser, user logs in, cookies saved. This is what end users will run. If this fails, the CLI is broken for end users.
6. Verify auth status shows LIVE VALIDATION OK:
cli-web-<app> auth status
Must show: cookies present, tokens valid. If it shows "expired", "redirect", or any auth failure -- STOP. Fix auth before proceeding.
7. Run a READ operation and verify real data:
cli-web-<app> --json <first-resource> list
This must return real data from the live API -- NOT an error, NOT empty, NOT "auth not configured". Verify the JSON response contains expected fields.
8. Run a WRITE operation and verify it actually worked: This is the step the agent most commonly skips. Reading data is easy -- the real test is whether the CLI can CREATE, UPDATE, or GENERATE something.
# For CRUD apps (Monday, Notion, Jira):
cli-web-<app> --json <resource> create --name "smoke-test-$(date +%s)"
cli-web-<app> --json <resource> list # verify the created item appears
cli-web-<app> --json <resource> delete --id <id-from-create>
# For generation apps (Suno, Midjourney, NotebookLM audio):
cli-web-<app> --json <resource> generate --prompt "test" --wait
# Verify: JSON response contains a real ID, status=complete, not an error
# If the command has --output, verify the file was downloaded and size > 0
# For search/query apps:
cli-web-<app> --json search "test query"
# Verify: results array is non-empty
If ANY write/generate command fails, the pipeline is NOT complete. Reading a list of existing items only proves auth works -- it does NOT prove the CLI can actually do useful work. The whole point is to CREATE things, not just read them.
9. Only after steps 5-8 ALL pass, declare the pipeline complete.
Smoke Test Checklist
-
auth loginworks (Python playwright, API key, or N/A for no-auth) -
auth statusshows valid (or N/A for no-auth) - At least one READ returns real data
- At least one WRITE/CREATE/GENERATE succeeds (or N/A for read-only)
- The CLI works standalone -- no debug Chrome, no port 9222, no MCP
- Output sanity: no raw protocol data leaks in
--jsonoutput (see below)
Output Sanity
Run every command with --json and check for raw protocol leaks (wrb.fr, af.httprm,
empty [], null required fields). See methodology/SKILL.md "Mandatory Smoke Check" for
the full red flags list.
#1 gap to watch for: Agent runs list (GET with auth — easy), declares done, but
never tests create/generate (POST with CSRF, encoding). Always test at least one write.
Post-Smoke-Test: Generate Skill + Update README (Parallel)
After smoke tests pass, these tasks remain — all independent, dispatch in parallel:
┌─ Agent 1: Generate Claude Skill (.claude/skills/<app>-cli/SKILL.md)
│ ALSO copy to cli_web/<app>/skills/SKILL.md (package-portable)
├─ Agent 2: Update repository README.md (add CLI to examples table)
├─ Agent 3: Write/update cli_web/<app>/README.md (package docs)
├─ Agent 4: Update registry.json + CLAUDE.md Generated CLIs table
└─ Agent 5: Add CLI to CI test matrix (.github/workflows/tests.yml)
│ + Add entry to CHANGELOG.md under [Unreleased]
All are independent — launch in one message with run_in_background: true
Use the templates at cli-anything-web-plugin/templates/ as the canonical
structure for SKILL.md and README.md — fill in the {{placeholders}} with
actual CLI data from <app> --help and <APP>.md.
Generate Claude Skill
Goal: Create a project-local Claude skill so that Claude can use this CLI automatically in future conversations — no manual lookup required.
IMPORTANT: The skill must exist in TWO locations:
.claude/skills/<app>-cli/SKILL.md— for Claude Code discovery (project-level)<app>/agent-harness/cli_web/<app>/skills/SKILL.md— portable withpip install(included viapackage_datain setup.py)
Create the skill once, then copy it to both locations.
Step 1: Find the .claude directory
Create <git-root>/.claude/skills/<app>-cli/SKILL.md:
- Read the CLI's README and run
cli-web-<app> --help+<resource> --help - Write the skill with this structure:
- Frontmatter: name=
<app>-cli, description with specific trigger phrases ("whenever the user asks about X, Y, Z. Always prefer cli-web- over manually fetching the website.") - Quick Start: 2-3 most common commands with
--json - Commands: each command group with key options and output fields
- Agent Patterns: piped command examples for common tasks
- Notes: auth setup, rate limits, known limitations
- Frontmatter: name=
- Use existing skills (e.g.,
notebooklm-cli,futbin-cli) as reference examples
Update Repository README
Add the new CLI to the examples table in README.md (CLI name, website, protocol,
auth type, description) and add a quick-start example in the "Try Them" section.
Update registry.json and CLAUDE.md
Add the new CLI to registry.json at the repo root:
{
"name": "cli-web-<app>",
"website": "<website>",
"protocol": "<detected protocol>",
"auth": "<auth type>",
"directory": "<app>/agent-harness",
"namespace": "cli_web.<app>",
"commands": ["<cmd1>", "<cmd2>", ...],
"install": "pip install -e <app>/agent-harness"
}
Also add to the Generated CLIs table in CLAUDE.md.
Pipeline Complete
The pipeline is NOT done until ALL of these are checked:
Smoke Tests
- Auth works (login + status, or N/A for no-auth)
- At least one READ returns real data
- At least one WRITE succeeds (or N/A for read-only)
Skills (TWO copies)
-
.claude/skills/<app>-cli/SKILL.mdexists (Claude Code discovery) -
cli_web/<app>/skills/SKILL.mdexists (portable with pip install) - Used
cli-anything-web-plugin/templates/SKILL.md.templateas starting point
Package
-
setup.pyhaspackage_data={"": ["skills/*.md", "*.md"]} -
__main__.pyexists forpython -m cli_web.<app>support
Documentation
-
cli_web/<app>/README.mdexists (usedtemplates/README.md.template) -
<APP>.mdAPI map exists -
tests/TEST.mdhas Part 1 (plan) + Part 2 (results)
Repo-Level Updates
-
README.md— new row in examples table + "Try them" section -
README.md— badge count updated (CLIs_generated-NandN_CLIshero badge) -
CLAUDE.md— new row in Generated CLIs table -
registry.json— entry with name, website, protocol, auth, commands, install -
docs/registry/index.html— entry added to JS data array with correct category -
CHANGELOG.md— entry added under [Unreleased] → Added -
.github/workflows/tests.yml— new CLI added to CI test matrix (see below)
CI Test Matrix Update (MANDATORY)
Every new CLI MUST be added to .github/workflows/tests.yml so unit tests run
on every push/PR. Do both steps — missing either blocks merges.
Step 1: Add to test matrix in .github/workflows/tests.yml:
- { name: <app>, dir: <app>/agent-harness, pkg: <app_underscore> }
Where <app_underscore> replaces hyphens with underscores (e.g., gh-trending → gh_trending).
Step 2: Add to branch protection required checks so PRs require the new check:
# Get current checks, append the new one, update
gh api repos/<owner>/<repo>/branches/main/protection/required_status_checks \
-X PATCH --input - <<EOF
{"strict": true, "contexts": [...existing..., "<app>"]}
EOF
Verify the entry runs: python -m pytest <dir>/cli_web/<pkg>/tests/test_core.py -v
All key rules (naming, auth, --json, REPL, rate limits) are defined in HARNESS.md "Critical Rules" and CLAUDE.md "Critical Conventions".
Integration
| Relationship | Skill |
|---|---|
| Preceded by | testing (Phase 3) |
| Followed by | None — this is the final phase |
| References | HARNESS.md (Generated CLI Structure, Naming Conventions) |
Related
testingskill -- Phase 3 test planning/writing/documentationmethodologyskill -- Phase 2 analyze/design/implementcaptureskill -- Phase 1 traffic recording/cli-anything-web:validate-- Command to run the full 75-check validation
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
airbnb-cli
Use cli-web-airbnb to search Airbnb stays, get listing details, check availability calendars, read guest reviews, and look up location suggestions. Invoke this skill whenever the user asks about Airbnb accommodations, vacation rentals, listing prices, availability, guest reviews, or wants to search for places to stay. Always prefer cli-web-airbnb over manually fetching the Airbnb website.
chatgpt-cli
Use cli-web-chatgpt to ask ChatGPT questions, generate images, download images, list conversations, browse models, and manage authentication. Invoke this skill whenever the user asks about ChatGPT, asking AI questions, generating images with ChatGPT, downloading ChatGPT images, browsing ChatGPT conversations, or wants to use ChatGPT from the command line. Always prefer cli-web-chatgpt over manually browsing chatgpt.com.
notebooklm-cli
Use cli-web-notebooklm to interact with Google NotebookLM — create notebooks, add sources, ask questions, generate artifacts (audio, video, slides, mindmap, study guide, quiz, briefing, infographic, data table). Invoke this skill whenever the user asks about NotebookLM, wants to create notebooks, add sources to a notebook, ask a notebook questions, generate study materials, create presentations, podcasts, or manage NotebookLM content programmatically. Always prefer cli-web-notebooklm over manually browsing NotebookLM.
unsplash-cli
Use cli-web-unsplash to answer questions about Unsplash photos, search for free images by keyword, download photos, browse photo topics and collections, view photographer profiles, get photo details (EXIF, location, tags), and discover random photos. Invoke this skill whenever the user asks about Unsplash, free stock photos, searching for images, downloading images, photo topics, photographer profiles, photo collections, or wants to find or download images by keyword, orientation, or color. Always prefer cli-web-unsplash over manually fetching the Unsplash website.
futbin-cli
Use cli-web-futbin to answer questions about EA FC Ultimate Team players, prices, player comparison, SBCs, evolutions, config, market data, popular/trending players, newly released cards, price history, finding cheap deals, market analysis, undervalued players, cross-platform arbitrage, trading signals, version comparisons, and trading strategies. Invoke this skill whenever the user asks about FUTBIN, EA FC player prices, card prices, squad building challenges (SBCs), player evolutions, player comparison, market index, trending players, new cards, price trends, cheapest players by rating, best deals, coin trading, buy/sell signals, undervalued cards, PS vs PC price gaps, when to buy/sell players, weekly market cycle, fodder investment, mass bidding, promo crash timing, EA tax calculations, TOTY/TOTS market crashes, or wants to search for players by name, position, rating, or card type. Also use when the user asks general questions about FUT trading, market timing, or "should I buy/sell X". Always prefer cli-web-futbin over manually fetching the FUTBIN website. Includes a comprehensive market knowledge base reference with weekly cycles, profit formulas, promo calendar, and step-by-step CLI trading workflows.
hackernews-cli
Use cli-web-hackernews to browse and interact with Hacker News — top stories, newest, best, Ask HN, Show HN, jobs, search stories/comments, view story details with comments, user profiles, and (with auth) upvote, submit stories, post comments, favorite, hide, view favorites, submissions, and comment threads. Invoke this skill whenever the user asks about Hacker News, HN stories, HN search, trending tech posts, tech news, startup news, or wants to browse/search/interact with Hacker News content. Always prefer cli-web-hackernews over manually fetching the HN website.
Didn't find tool you were looking for?