A Claude Code skill that wraps the official browser-use library, enabling AI-powered browser automation through two modes:
- Direct Mode - Claude directly controls browser via Actor API (no external LLM API key required!)
- Subagent Mode - Delegate complex tasks to autonomous Claude Code subagents
The official browser-use library is designed for standalone Python scripts with LLM integration. This skill adapts it for Claude Code by:
- Server Mode: Maintains browser session across multiple tool calls
- Direct Control: Claude can control browser step-by-step using Vision
- No API Key Required: Direct Mode works without OpenAI/Gemini API keys
- Session Persistence: Browser stays open until explicitly closed
- Navigate to any URL
- Click elements by index or coordinates
- Type text into forms
- Take screenshots (Vision-compatible)
- Keyboard input and shortcuts
- Scroll and mouse actions
- Tab management
- JavaScript evaluation
pip install browser-use
playwright install chromium# Clone this repo
git clone https://github.com/tau-breath/browser-use-skill.git
# Copy to Claude Code skills directory
cp -r browser-use-skill ~/.claude/skills/browser-usecd ~/.claude/skills/browser-use
python server.py start &
sleep 2
python server.py status
python server.py stopcd ~/.claude/skills/browser-use
# Start server (keep running in background)
python server.py start &
sleep 2
# Navigate
python server.py call '{"tool": "navigate", "args": {"url": "https://google.com"}}'
# Get page state with screenshot
python server.py call '{"tool": "get_state", "args": {"include_screenshot": true}}'
# Type into search box (index from get_state)
python server.py call '{"tool": "type", "args": {"index": 0, "text": "Claude AI"}}'
# Press Enter
python server.py call '{"tool": "press_key", "args": {"key": "Enter"}}'
# Screenshot results
python server.py call '{"tool": "screenshot", "args": {"path": "results.png"}}'
# Stop server when done
python server.py stop| Command | Description |
|---|---|
python server.py start & |
Start server in background |
python server.py stop |
Stop server |
python server.py status |
Check server status |
python server.py call '{...}' |
Call a tool |
navigate- Go to URLgo_back/go_forward- Navigation historyreload- Refresh pageget_state- Get elements + optional screenshotscreenshot- Save screenshot to fileevaluate- Run JavaScriptpress_key- Keyboard input
find_elements- Find by CSS selectorclick- Click element by indextype- Type text into elementhover- Mouse hovercheck- Toggle checkboxselect_option- Select dropdown optiondrag_to- Drag and drop
mouse_click- Click at coordinatesmouse_move- Move mousemouse_drag- Drag from A to Bscroll- Scroll page
list_tabs- List open tabsswitch_tab- Switch to tabclose_tab- Close tabclose- Close browser
# Start server
cd ~/.claude/skills/browser-use && python server.py start &
sleep 2
# Navigate to Naver search
python server.py call '{"tool": "navigate", "args": {"url": "https://search.naver.com/search.naver?query=Claude+AI"}}'
# Get screenshot
python server.py call '{"tool": "get_state", "args": {"include_screenshot": true}}'
# Returns: {"url": "...", "elements": [...], "screenshot_path": "/path/to/screenshot.png"}
# Read screenshot with Vision to analyze resultspython server.py start &
sleep 2- Use
get_stateto refresh element cache - Try
press_key("Tab")thenpress_key("Enter") - Use
mouse_click(x, y)with coordinates from screenshot
Always use server.py commands. Direct Python calls don't maintain session.
Without Server Mode:
Call 1: navigate -> opens browser -> closes on exit (state lost!)
Call 2: click -> ERROR: no browser!
With Server Mode:
Start server: browser session created
Call 1: navigate -> works
Call 2: click -> works (same session!)
Call N: ... -> works
Stop server: browser closes
- Python 3.8+
- browser-use (
pip install browser-use) - Playwright + Chromium (
playwright install chromium) - (Optional) LLM API key for
run_agenttool
MIT
- browser-use - The official browser automation library
Freedom without surveillance, protection for everyone.