Agent skill

hanzi-browse

Use when a task requires interacting with a real browser — clicking, typing, filling forms, reading authenticated pages, posting content, or testing web apps. Also use when the user says "open", "go to", "check this site", "log in to", "post on", or needs their real cookies and sessions.

View SKILL.md on GitHub Repository

Stars 148

Forks 42

Install this agent skill to your Project

npx add-skill https://github.com/hanzili/hanzi-browse/tree/main/server/skills/hanzi-browse

SKILL.md

Hanzi Browse

Give your AI agent a real browser. Hanzi controls the user's actual Chrome with their existing logins, cookies, and sessions — not a headless sandbox.

When to Use

Browser-required tasks: click buttons, fill forms, submit data, navigate pages
Authenticated access: read pages that need login (email, dashboards, admin panels, social feeds)
Visual verification: check what a page actually looks like, take screenshots
Real-world posting: publish to LinkedIn, Twitter/X, Reddit from the user's signed-in account
E2E testing: test a web app in a real browser on localhost or staging

When NOT to Use

Reading public pages (use WebFetch or curl instead)
API calls (use fetch or HTTP tools)
File operations (use filesystem tools)
Services with dedicated MCP tools (Gmail, Calendar, Notion, Stripe) — use those tools instead, they're faster and more reliable
Anything that doesn't need a rendered browser

Tool Selection Rule

Prefer non-browser tools first. Check if a dedicated MCP tool exists for the service (Gmail, Calendar, Notion, etc.) — use that instead. Gather all context you can (code search, git log, file reads, API calls) BEFORE opening the browser. Use Hanzi only for steps that genuinely require a rendered page.

Setup

If browser_status returns an error or the tool doesn't exist:

Hanzi Browse isn't set up yet.

Run: npx hanzi-browse setup

This installs the Chrome extension and adds the MCP server to your agent (~1 minute).

MCP Tools

`browser_start`

Start a task. An autonomous agent navigates, clicks, types, and fills forms. Blocks until complete, waiting for input, or timeout (5 min).

browser_start({
  task: "Go to LinkedIn and send a connection request to Jane Doe at Acme Corp",
  url: "https://linkedin.com",
  context: "Connection note: Hi Jane, loved your post about AI agents. Would love to connect."
})

Returns: { session_id, status, result }

`status`	Meaning	What to do
`"completed"`	Task finished successfully	Read `result` for the answer
`"waiting"`	Agent needs input or clarification	Send `browser_message` with the answer
`"error"`	Something went wrong	Call `browser_screenshot`, then retry or adjust
`"timeout"`	Hit 5-minute limit	Call `browser_message` to continue, or `browser_stop`

Tips:

task — be specific: include the website, the goal, and details
url — starting page (optional, agent can navigate itself)
context — everything the agent needs: form values, text to paste, tone, credentials, choices
You can run multiple browser_start calls in parallel — each gets its own window
session_id is returned here — use it for all follow-up calls

`browser_message`

Send a follow-up to a running or paused session.

browser_message({
  session_id: "abc123",
  message: "Now also check the Settings page"
})

Use when:

The agent asked a question and you have the answer
You want the agent to do more in the same browser window
You need to provide additional context mid-task

`browser_status`

Check session status. Returns session ID, status, task description, and last 5 steps.

browser_status()                           // all sessions
browser_status({ session_id: "abc123" })   // specific session

`browser_stop`

Stop a session. Browser window stays open for review unless remove: true.

browser_stop({ session_id: "abc123" })                // stop, keep window
browser_stop({ session_id: "abc123", remove: true })   // stop + close window

`browser_screenshot`

Capture current page as PNG. Useful when a task errors or times out — see what the agent was looking at.

browser_screenshot({ session_id: "abc123" })

Patterns

Multi-step workflow

// Step 1: Research
const result = browser_start({
  task: "Find the top 3 AI startups hiring in SF on LinkedIn Jobs",
  url: "https://linkedin.com/jobs"
})

// Step 2: Follow up in same session
browser_message({
  session_id: result.session_id,
  message: "Now save each job to my 'AI Jobs' collection"
})

Parallel tasks

// These run simultaneously in separate browser windows
browser_start({ task: "Post announcement on LinkedIn", context: announcement })
browser_start({ task: "Post announcement on Twitter", context: announcement })
browser_start({ task: "Post announcement on Reddit r/programming", context: announcement })

Error recovery

const result = browser_start({ task: "Fill out the application form", ... })

if (result.status === "error") {
  // See what went wrong
  const screenshot = browser_screenshot({ session_id: result.session_id })
  // Try again with more context
  browser_message({
    session_id: result.session_id,
    message: "The form has a CAPTCHA. Please wait for me to solve it, then continue."
  })
}

Safety

Production URLs: If a task will create real data (signups, posts, orders), confirm with the user first
Credentials in context: Pass credentials via the context field, not in task
Public actions: Posts, messages, and form submissions are real and visible. Always show the user what you'll post before doing it
Rate limits: Social platforms may rate-limit. If the agent reports a CAPTCHA or block, stop and tell the user

Common Mistakes

Mistake	Fix
Using browser to read public pages	Use `WebFetch` or `curl` — faster, no browser needed
Vague task descriptions	Be specific: "Go to X, click Y, fill Z with these values"
Not passing context	Put form values, text to paste, and preferences in `context`
Running sequentially when parallel works	Multiple `browser_start` calls run in separate windows
Not checking screenshots on error	Always call `browser_screenshot` when a task fails

Workflow Skills

Hanzi Browse also ships workflow skills for common tasks:

e2e-tester — Test web apps like a QA person with real browser interactions
social-poster — Draft and post content across LinkedIn, Twitter/X, Reddit
linkedin-prospector — Find and connect with prospects on LinkedIn
a11y-auditor — Run accessibility audits in a real browser
x-marketer — Twitter/X marketing workflows

Install workflow skills from: github.com/hanzili/hanzi-browse/tree/main/server/skills

Maintainer

hanzili Core maintainer

Source details

Full Name: hanzili/hanzi-browse
Branch: main
Path in repo: server/skills/hanzi-browse
License: Other
Topics: claude-code mcp developer-tools ai-agent browser-automation chrome chrome-extension web-automation computer-use linkedin-automation

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

hanzili/hanzi-browse

apartment-finder

148 42

Explore

hanzili/hanzi-browse

data-extractor

148 42

Explore

hanzili/hanzi-browse

seo-checker

Audit web pages for SEO issues in a real browser. Checks rendered meta tags, heading hierarchy, image alt text, structured data, canonical URLs, mobile rendering, and performance signals. Produces a scored report with specific findings and fixes. Read-only — inspects, doesn't modify. Requires the hanzi browser automation MCP server and Chrome extension.

148 42

Explore

hanzili/hanzi-browse

linkedin-prospector

Find people on LinkedIn and send personalized connection requests. Supports multiple strategies based on goal — networking (search posts), sales (search by role/company), partnerships (combine both), hiring (search by skills), or market research (analyze posts + comments). Each connection note is unique and personalized. Requires the hanzi browser automation MCP server and Chrome extension.

148 42

Explore

hanzili/hanzi-browse

a11y-auditor

Audit web pages for accessibility issues in a real browser. Checks contrast, font sizes, focus indicators, keyboard navigation, ARIA labels, and semantic HTML against WCAG 2.1 AA. Reports findings with screenshots and specific remediation steps. Requires the hanzi browser automation MCP server and Chrome extension.

148 42

Explore

hanzili/hanzi-browse

competitor-monitor

Monitor competitor websites for changes. Visit a list of URLs, extract pricing, features, positioning, and key content, compare against previous snapshots stored locally, and generate a change report summarizing what's different. Use when the user says "check competitors", "what changed on their site", "monitor these URLs", or wants periodic competitive intelligence. Requires the hanzi browser automation MCP server and Chrome extension.

148 42

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Hanzi Browse

When to Use

When NOT to Use

Tool Selection Rule

Setup

MCP Tools

browser_start

browser_message

browser_status

browser_stop

browser_screenshot

Patterns

Multi-step workflow

Parallel tasks

Error recovery

Safety

Common Mistakes

Workflow Skills

Recommended Agent Skills

apartment-finder

data-extractor

seo-checker

linkedin-prospector

a11y-auditor

competitor-monitor

`browser_start`

`browser_message`

`browser_status`

`browser_stop`

`browser_screenshot`