Agent skill
agent-browser
Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video
Install this agent skill to your Project
npx add-skill https://github.com/autohandai/community-skills/tree/main/agent-browser-toolshell
SKILL.md
Agentic Browser
Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.
Quick Start
Requires inference.sh CLI (
infsh). Get installation instructions:npx skills add inference-sh/skills@agent-tools
infsh login
# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
Core Workflow
Every browser automation follows this pattern:
- Open - Navigate to URL, get
@erefs for elements - Interact - Use refs to click, fill, drag, etc.
- Re-snapshot - After navigation/changes, get fresh refs
- Close - End session (returns video if recording)
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
Functions
| Function | Description |
|---|---|
open |
Navigate to URL, configure browser (viewport, proxy, video recording) |
snapshot |
Re-fetch page state with @e refs after DOM changes |
interact |
Perform actions using @e refs (click, fill, drag, upload, etc.) |
screenshot |
Take page screenshot (viewport or full page) |
execute |
Run JavaScript code on the page |
close |
Close session, returns video if recording was enabled |
Interact Actions
| Action | Description | Required Fields |
|---|---|---|
click |
Click element | ref |
dblclick |
Double-click element | ref |
fill |
Clear and type text | ref, text |
type |
Type text (no clear) | text |
press |
Press key (Enter, Tab, etc.) | text |
select |
Select dropdown option | ref, text |
hover |
Hover over element | ref |
check |
Check checkbox | ref |
uncheck |
Uncheck checkbox | ref |
drag |
Drag and drop | ref, target_ref |
upload |
Upload file(s) | ref, file_paths |
scroll |
Scroll page | direction (up/down/left/right), scroll_amount |
back |
Go back in history | - |
wait |
Wait milliseconds | wait_ms |
goto |
Navigate to URL | url |
Element Refs
Elements are returned with @e refs:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
Important: Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading
Features
Video Recording
Record browser sessions for debugging or documentation:
# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true,
"show_cursor": true
}' | jq -r '.session_id')
# ... perform actions ...
# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}
Cursor Indicator
Show a visible cursor in screenshots and video (useful for demos):
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"show_cursor": true,
"record_video": true
}'
The cursor appears as a red dot that follows mouse movements and shows click feedback.
Proxy Support
Route traffic through a proxy server:
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"proxy_url": "http://proxy.example.com:8080",
"proxy_username": "user",
"proxy_password": "pass"
}'
File Upload
Upload files to file inputs:
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "upload",
"ref": "@e5",
"file_paths": ["/path/to/file.pdf"]
}'
Drag and Drop
Drag elements to targets:
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "drag",
"ref": "@e1",
"target_ref": "@e2"
}'
JavaScript Execution
Run custom JavaScript:
infsh app run agent-browser --function execute --session $SESSION --input '{
"code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}
Deep-Dive Documentation
| Reference | Description |
|---|---|
| references/commands.md | Full function reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Session persistence, parallel sessions |
| references/authentication.md | Login flows, OAuth, 2FA handling |
| references/video-recording.md | Recording workflows for debugging |
| references/proxy-support.md | Proxy configuration, geo-testing |
Ready-to-Use Templates
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse session |
| templates/capture-workflow.sh | Content extraction with screenshots |
Examples
Form Submission
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')
# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
Search and Extract
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
Screenshot with Video
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true
}' | jq -r '.session_id')
# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
"full_page": true
}'
# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
Sessions
Browser state persists within a session. Always:
- Start with
--session newon first call - Use returned
session_idfor subsequent calls - Close session when done
Related Skills
# Web search (for research + browse)
npx skills add inference-sh/skills@web-search
# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models
Documentation
- inference.sh Sessions - Session management
- Multi-function Apps - How functions work
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mapping-mitre-attack-techniques
Maps observed adversary behaviors, security alerts, and detection rules to MITRE ATT&CK techniques and sub-techniques to quantify detection coverage and guide control prioritization. Use when building an ATT&CK-based coverage heatmap, tagging SIEM alerts with technique IDs, aligning security controls to adversary playbooks, or reporting threat exposure to executives. Activates for requests involving ATT&CK Navigator, Sigma rules, MITRE D3FEND, or coverage gap analysis.
hunting-for-spearphishing-indicators
Hunt for spearphishing campaign indicators across email logs, endpoint telemetry, and network data to detect targeted email attacks.
analyzing-malicious-url-with-urlscan
URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat
implementing-zero-standing-privilege-with-cyberark
Deploy CyberArk Secure Cloud Access to eliminate standing privileges in hybrid and multi-cloud environments using just-in-time access with time, entitlement, and approval controls.
implementing-pam-for-database-access
Deploy privileged access management for database systems including Oracle, SQL Server, PostgreSQL, and MySQL. Covers session proxy configuration, credential vaulting, query auditing, dynamic credentia
detecting-t1003-credential-dumping-with-edr
Detect OS credential dumping techniques targeting LSASS memory, SAM database, NTDS.dit, and cached credentials using EDR telemetry, Sysmon process access monitoring, and Windows security event correlation.
Didn't find tool you were looking for?