Agent skill

open-interpreter

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/skills/other/open-interpreter

SKILL.md

OpenInterpreter — Desktop GUI Automation

Desktop control for Claude Code via OpenInterpreter (62k stars, AGPL-3.0). Mouse, keyboard, screenshot, and OCR primitives backed by pyautogui + pytesseract.

Mode Selection

Mode LLM Script Best For
Library Claude Code (native) Individual scripts below Surgical GUI actions — Claude sees screenshots, reasons, dispatches actions
OS subprocess Claude API (via OI) oi_os_mode.py Full autonomous computer use — delegate entire GUI tasks
Local agent Ollama (offline) oi_os_mode.py --local Offline computer use, no API costs, privacy-sensitive tasks

Use Library mode by default. Use OS subprocess to delegate self-contained GUI tasks. Use Local agent when offline or to avoid API costs.

Installation

Run once:

bash
~/.claude/skills/open-interpreter/scripts/oi_install.sh

Installs open-interpreter[os] via uv, verifies pyautogui and tesseract, checks macOS permissions.

macOS permissions (one-time, manual):

  • System Settings > Privacy & Security > Accessibility > add terminal app (Ghostty/Terminal/iTerm2)
  • System Settings > Privacy & Security > Screen Recording > add terminal app

Verify permissions:

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_permission_check.py

Library Mode: The Screenshot Loop

The core pattern for GUI automation:

1. Take screenshot   →  oi_screenshot.py
2. Read PNG          →  Claude Read tool (native vision)
3. Decide action     →  Claude reasoning
4. Execute action    →  oi_click.py / oi_type.py
5. Verify            →  Take another screenshot
6. Loop until done

Scripts

oi_screenshot.py — Capture screen, return file path with Retina metadata

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_screenshot.py
python3 ~/.claude/skills/open-interpreter/scripts/oi_screenshot.py --region 0,0,800,600
python3 ~/.claude/skills/open-interpreter/scripts/oi_screenshot.py --active-window

Output (3 lines):

/tmp/oi_screenshot_1708789200.png
SCALE_FACTOR=2
SCREEN_SIZE=1512x982

oi_click.py — Mouse click by coordinates or OCR text

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 450 --y 300
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 900 --y 600 --image-coords
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --text "Submit"
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 450 --y 300 --double
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 450 --y 300 --right
  • --image-coords: auto-divides by Retina scale factor (use when coordinates come from screenshot image pixels)
  • --text: OCR-based — screenshots, finds text via pytesseract, clicks center of match

oi_type.py — Keyboard input

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --text "hello world"
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --key enter
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --hotkey command space
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --text "search" --method typewrite
  • Default text input: clipboard-paste (Cmd+V) for speed and Unicode safety
  • --method typewrite: character-by-character (use when clipboard is needed for other purposes)
  • --hotkey: AppleScript on macOS for reliable modifier key handling

oi_find_text.py — OCR screen reading

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_find_text.py --text "Submit"
python3 ~/.claude/skills/open-interpreter/scripts/oi_find_text.py --text "Price" --screenshot /tmp/ss.png

Returns JSON array: [{"text": "Submit", "x": 450, "y": 300, "w": 80, "h": 24, "confidence": 95}]

oi_computer.py — Unified dispatch for all actions

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py screenshot
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py click --x 450 --y 300
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py type --text "hello"
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py find --text "Submit"
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py scroll --clicks 3
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py mouse-position
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py screen-size

Retina Display Handling

macOS Retina displays render at 2x (or 3x) scaling. Screenshot image pixels differ from screen coordinates:

Metric Example (14" MBP)
Image pixels (screenshot) 3024 x 1964
Screen coordinates (pyautogui) 1512 x 982
Scale factor 2x

When estimating click targets from a screenshot image, use --image-coords on oi_click.py to auto-divide by the scale factor. The oi_screenshot.py output includes SCALE_FACTOR metadata.

OS Mode: Delegate Full Tasks

For self-contained GUI tasks, delegate to OI's full agent loop:

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_os_mode.py "Open Calculator and compute 2+2"
python3 ~/.claude/skills/open-interpreter/scripts/oi_os_mode.py --provider anthropic "Change the desktop wallpaper"

OI runs its own screenshot → analyze → act loop using the Claude API. Requires ANTHROPIC_API_KEY.

Local Mode: Offline Computer Use

Run OI with a local vision model via Ollama:

bash
python3 ~/.claude/skills/open-interpreter/scripts/oi_os_mode.py --local "What apps are open?"

Prerequisites:

  1. Ollama running: ollama serve
  2. Vision model pulled: ollama pull llama3.2-vision

Limitation: Local models use OI's classic code-execution mode, not the screenshot-driven OS Mode (which requires Claude 3.5 Sonnet). Local mode generates and executes code to accomplish GUI tasks rather than using pixel-level screenshot analysis.

Common Recipes

Open an App via Spotlight

bash
python3 scripts/oi_type.py --hotkey command space
sleep 0.5
python3 scripts/oi_type.py --text "Calculator"
sleep 0.3
python3 scripts/oi_type.py --key enter

Read Text from Screen

bash
python3 scripts/oi_screenshot.py > /tmp/ss_meta.txt
python3 scripts/oi_find_text.py --text "Total" --screenshot "$(head -1 /tmp/ss_meta.txt)"

Click a Button by Label

bash
python3 scripts/oi_click.py --text "Save"

Fill a Form Field

bash
python3 scripts/oi_click.py --text "Email"
python3 scripts/oi_type.py --text "user@example.com"
python3 scripts/oi_type.py --key tab
python3 scripts/oi_type.py --text "password123"

Safety

  1. Confirm before destructive actions — before clicking Send, Delete, Submit, or Confirm buttons, verify with the user
  2. Screenshot before and after every action for verification
  3. No unbounded autonomous loops — confirm with user between multi-step GUI workflows
  4. pyautogui failsafe — moving mouse to any screen corner raises pyautogui.FailSafeException (enabled by default)
  5. Action logging — every script logs actions to stderr: [oi] click at (450, 300) button=left

Troubleshooting

Symptom Fix
oi_screenshot.py returns black image Grant Screen Recording permission to terminal app
oi_click.py / oi_type.py no effect Grant Accessibility permission to terminal app
OCR finds no text Verify tesseract: which tesseract && tesseract --version
Retina coordinates off by 2x Use --image-coords flag on oi_click.py
oi_find_text.py low confidence Try larger text, ensure screen is not obstructed
OS Mode hangs Verify ANTHROPIC_API_KEY is set, check OI stderr output
Local mode fails Verify Ollama running (ollama list) and model pulled

Reference Documentation

File Contents
references/computer-api.md OI Computer API reference — mouse, keyboard, display, clipboard
references/os-mode.md OS Mode usage, provider configuration, agent loop architecture
references/safety-and-permissions.md macOS permissions guide, safety model, failsafe configuration

Didn't find tool you were looking for?

Be as detailed as possible for better results