Agent skill

agent-browser

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video

Stars 0
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/autohandai/community-skills/tree/main/agent-browser-toolshell

SKILL.md

Agentic Browser

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Agentic Browser

Quick Start

Requires inference.sh CLI (infsh). Get installation instructions: npx skills add inference-sh/skills@agent-tools

bash
infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Core Workflow

Every browser automation follows this pattern:

  1. Open - Navigate to URL, get @e refs for elements
  2. Interact - Use refs to click, fill, drag, etc.
  3. Re-snapshot - After navigation/changes, get fresh refs
  4. Close - End session (returns video if recording)
bash
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

Function Description
open Navigate to URL, configure browser (viewport, proxy, video recording)
snapshot Re-fetch page state with @e refs after DOM changes
interact Perform actions using @e refs (click, fill, drag, upload, etc.)
screenshot Take page screenshot (viewport or full page)
execute Run JavaScript code on the page
close Close session, returns video if recording was enabled

Interact Actions

Action Description Required Fields
click Click element ref
dblclick Double-click element ref
fill Clear and type text ref, text
type Type text (no clear) text
press Press key (Enter, Tab, etc.) text
select Select dropdown option ref, text
hover Hover over element ref
check Check checkbox ref
uncheck Uncheck checkbox ref
drag Drag and drop ref, target_ref
upload Upload file(s) ref, file_paths
scroll Scroll page direction (up/down/left/right), scroll_amount
back Go back in history -
wait Wait milliseconds wait_ms
goto Navigate to URL url

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

bash
# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

bash
infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}

Deep-Dive Documentation

Reference Description
references/commands.md Full function reference with all options
references/snapshot-refs.md Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md Session persistence, parallel sessions
references/authentication.md Login flows, OAuth, 2FA handling
references/video-recording.md Recording workflows for debugging
references/proxy-support.md Proxy configuration, geo-testing

Ready-to-Use Templates

Template Description
templates/form-automation.sh Form filling with validation
templates/authenticated-session.sh Login once, reuse session
templates/capture-workflow.sh Content extraction with screenshots

Examples

Form Submission

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

  1. Start with --session new on first call
  2. Use returned session_id for subsequent calls
  3. Close session when done

Related Skills

bash
# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models

Documentation

Expand your agent's capabilities with these related and highly-rated skills.

autohandai/community-skills

mapping-mitre-attack-techniques

Maps observed adversary behaviors, security alerts, and detection rules to MITRE ATT&CK techniques and sub-techniques to quantify detection coverage and guide control prioritization. Use when building an ATT&CK-based coverage heatmap, tagging SIEM alerts with technique IDs, aligning security controls to adversary playbooks, or reporting threat exposure to executives. Activates for requests involving ATT&CK Navigator, Sigma rules, MITRE D3FEND, or coverage gap analysis.

0 0
Explore
autohandai/community-skills

hunting-for-spearphishing-indicators

Hunt for spearphishing campaign indicators across email logs, endpoint telemetry, and network data to detect targeted email attacks.

0 0
Explore
autohandai/community-skills

analyzing-malicious-url-with-urlscan

URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat

0 0
Explore
autohandai/community-skills

implementing-zero-standing-privilege-with-cyberark

Deploy CyberArk Secure Cloud Access to eliminate standing privileges in hybrid and multi-cloud environments using just-in-time access with time, entitlement, and approval controls.

0 0
Explore
autohandai/community-skills

implementing-pam-for-database-access

Deploy privileged access management for database systems including Oracle, SQL Server, PostgreSQL, and MySQL. Covers session proxy configuration, credential vaulting, query auditing, dynamic credentia

0 0
Explore
autohandai/community-skills

detecting-t1003-credential-dumping-with-edr

Detect OS credential dumping techniques targeting LSASS memory, SAM database, NTDS.dit, and cached credentials using EDR telemetry, Sysmon process access monitoring, and Windows security event correlation.

0 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results