Agent skill
gemini-computer-use
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Install this agent skill to your Project
npx add-skill https://github.com/autohandai/community-skills/tree/main/gemini-computer-use
SKILL.md
Gemini Computer Use
Quick start
-
Source the env file and set your API key:
bashcp env.example env.sh $EDITOR env.sh source env.sh -
Create a virtual environment and install dependencies:
bashpython -m venv .venv source .venv/bin/activate pip install google-genai playwright playwright install chromium -
Run the agent script with a prompt:
bashpython scripts/computer_use_agent.py \ --prompt "Find the latest blog post title on example.com" \ --start-url "https://example.com" \ --turn-limit 6
Browser selection
- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with
COMPUTER_USE_BROWSER_CHANNEL. - Use a custom Chromium-based executable (e.g., Brave) with
COMPUTER_USE_BROWSER_EXECUTABLE.
If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.
Core workflow (agent loop)
- Capture a screenshot and send the user goal + screenshot to the model.
- Parse
function_callactions in the response. - Execute each action in Playwright.
- If a
safety_decisionisrequire_confirmation, prompt the user before executing. - Send
function_responseobjects containing the latest URL + screenshot. - Repeat until the model returns only text (no actions) or you hit the turn limit.
Operational guidance
- Run in a sandboxed browser profile or container.
- Use
--excludeto block risky actions you do not want the model to take. - Keep the viewport at 1440x900 unless you have a reason to change it.
Resources
- Script:
scripts/computer_use_agent.py - Reference notes:
references/google-computer-use.md - Env template:
env.example
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mapping-mitre-attack-techniques
Maps observed adversary behaviors, security alerts, and detection rules to MITRE ATT&CK techniques and sub-techniques to quantify detection coverage and guide control prioritization. Use when building an ATT&CK-based coverage heatmap, tagging SIEM alerts with technique IDs, aligning security controls to adversary playbooks, or reporting threat exposure to executives. Activates for requests involving ATT&CK Navigator, Sigma rules, MITRE D3FEND, or coverage gap analysis.
hunting-for-spearphishing-indicators
Hunt for spearphishing campaign indicators across email logs, endpoint telemetry, and network data to detect targeted email attacks.
analyzing-malicious-url-with-urlscan
URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat
implementing-zero-standing-privilege-with-cyberark
Deploy CyberArk Secure Cloud Access to eliminate standing privileges in hybrid and multi-cloud environments using just-in-time access with time, entitlement, and approval controls.
implementing-pam-for-database-access
Deploy privileged access management for database systems including Oracle, SQL Server, PostgreSQL, and MySQL. Covers session proxy configuration, credential vaulting, query auditing, dynamic credentia
detecting-t1003-credential-dumping-with-edr
Detect OS credential dumping techniques targeting LSASS memory, SAM database, NTDS.dit, and cached credentials using EDR telemetry, Sysmon process access monitoring, and Windows security event correlation.
Didn't find tool you were looking for?