Control your real Chrome browser over HTTP. Navigate pages, click elements, fill forms, take screenshots, run JavaScript — all via simple POST/GET requests to localhost:9223.
Uses your actual Chrome profile with all your cookies, sessions, and logins. No --remote-debugging-port needed.
Your agent (any language, any framework)
|
| HTTP requests to localhost:9223
v
browser.py (:9223) HTTP API gateway — auto-starts everything
|
| WebSocket
v
relay.py (:9222) CDP message router
|
| WebSocket
v
Chrome extension Translates CDP to chrome.debugger API
|
| chrome.debugger
v
Chrome Your real browser, real profile, real cookies
Chrome blocks --remote-debugging-port on non-temporary profiles. This project works around that by using a Chrome extension that calls chrome.debugger internally, bridged to a standard CDP endpoint through a local relay server. browser.py wraps the whole thing in a clean HTTP API so any agent that can make HTTP calls can drive the browser.
- Python 3.8+
aiohttpandrequests(pip install aiohttp requests)- Chrome (Windows, macOS, or Linux)
git clone https://github.com/autonet-code/web.git
cd web
pip install aiohttp requests- Open
chrome://extensionsin Chrome - Enable Developer mode (toggle in top right)
- Click Load unpacked and select the
extension/folder - The extension popup should show "Agent CDP Relay"
python browser.pyThis auto-starts the relay server and connects to Chrome. Output:
Starting relay server...
Relay server ready on port 9222
Extension connected
Browser API ready on http://127.0.0.1:9223
If Chrome is already running with the extension loaded, it connects directly. If not, it launches Chrome with the extension.
import requests
API = "http://127.0.0.1:9223"
# Open a page (creates a new tab and auto-attaches)
requests.post(f"{API}/navigate", json={"url": "https://example.com"})
# Read the title
requests.get(f"{API}/title").json()["value"] # "Example Domain"
# Click a link
requests.post(f"{API}/click", json={"selector": "a"})
# Type into an input (works with React, Vue, etc.)
requests.post(f"{API}/type", json={"selector": "input", "text": "hello"})
# Run any JavaScript
requests.post(f"{API}/eval", json={"js": "document.title"})
# Take a screenshot
requests.get(f"{API}/screenshot/file").json()["path"] # /tmp/screenshot.png
# Detach when done (removes debugger, keeps tab open)
requests.post(f"{API}/detach")| Endpoint | Method | Body | Returns |
|---|---|---|---|
/status |
GET | — | {connected, session_id, target_id} |
/tabs |
GET | — | [{id, title, url}, ...] |
/navigate |
POST | {url, new_tab?} |
{target_id, session_id, url} |
/attach |
POST | {target_id} |
{session_id, target_id} |
/detach |
POST | — | {detached, session_id} |
/close |
POST | {target_id?} |
{success} |
| Endpoint | Method | Body | Returns |
|---|---|---|---|
/eval |
POST | {js} |
{value, type} |
/click |
POST | {selector} |
{value: "clicked"} |
/type |
POST | {selector, text, clear?, submit?} |
{value: "typed", method} |
/wait |
POST | {selector, timeout?, visible?} |
{found, elapsed_ms, tag?, text?} |
/text |
GET | — | {value: "page text..."} |
/url |
GET | — | {value: "https://..."} |
/title |
GET | — | {value: "Page Title"} |
/screenshot |
GET | ?format=file |
{path} or {data_length, data} |
/screenshot/file |
GET | — | {path} |
| Endpoint | Method | Returns |
|---|---|---|
/observe |
POST | {value: "observer_started"} |
/events |
GET | {active, count, events: [...]} |
Injects a MutationObserver that tracks dialogs, modals, toasts, forms, iframes, images, canvas, video/audio, and other dynamic DOM changes.
| Endpoint | Method | Body |
|---|---|---|
/cdp |
POST | {method, params, session?} |
Full Chrome DevTools Protocol passthrough for anything the convenience endpoints don't cover.
You must attach to a tab before interacting with it. Without an attached session, interaction endpoints return {"error": "No session attached"}.
1. GET /tabs → find an existing tab
or POST /navigate {"url": "..."} → open a new tab (auto-attaches)
2. POST /attach {"target_id": "..."} → attach to existing tab
3. POST /eval, /click, /type, etc. → interact with the page
4. POST /detach → release the tab when done
/navigate auto-attaches to the new tab. Existing tabs need an explicit /attach.
Active tabs show a red dot (🔴) in their title so the user knows which tab is agent-controlled. The marker is automatically removed on /detach and cleaned up on startup if a previous session crashed.
/type works with standard inputs, React/Vue/Angular controlled components, and contenteditable elements (ChatGPT, Notion, etc.):
- Standard inputs/textareas: Uses
nativeInputValueSetterto bypass React's internal value tracking, then firesinputandchangeevents - Contenteditable: Uses
document.execCommand('insertText')which triggers all framework event handlers clear(defaulttrue): Clears existing value before typingsubmit(defaultfalse): After typing, clicks a submit button or simulates Enter
# Wait up to 5 seconds for an element to appear
r = requests.post(f"{API}/wait", json={"selector": ".results", "timeout": 5000})
r.json() # {"found": true, "elapsed_ms": 1200, "tag": "DIV", "text": "..."}
# Wait for element to be visible (not just in DOM)
requests.post(f"{API}/wait", json={"selector": ".modal", "visible": true, "timeout": 3000})Chrome skips layout computation for non-visible tabs, so innerText returns empty. Always use textContent:
r = requests.post(f"{API}/eval", json={"js": "document.body.textContent.substring(0, 5000)"})GET /text uses innerText internally — prefer /eval with textContent when the tab might be in the background.
The browser runs your real Chrome profile. If you're logged into Gmail, GitHub, etc., those sessions are available:
requests.post(f"{API}/navigate", json={"url": "https://github.com/notifications"})r = requests.post(f"{API}/eval", json={"js": """(function() {
var items = [];
document.querySelectorAll('.item').forEach(function(el) {
items.push({
title: el.querySelector('h3').textContent.trim(),
url: el.querySelector('a').href
});
});
return JSON.stringify(items);
})()"""})
data = json.loads(r.json()["value"])The apps/ directory contains DOM models for specific websites — CSS selectors, interaction recipes, and data extraction patterns that have been verified against live pages.
apps/
chatgpt.com.json Selectors and extraction for ChatGPT
mail.google.com.json Selectors and extraction for Gmail
These are living documents. Agents can load them before interacting with a site to avoid re-discovering selectors, and update them when sites change. See apps/README.md for the schema.
browser.py HTTP API gateway (port 9223) — the main entry point
relay.py CDP relay server (port 9222) — started by browser.py
chrome_cdp.py Standalone relay launcher (for custom integrations)
extension/
manifest.json Chrome MV3 extension manifest
service-worker.js CDP command handler via chrome.debugger
keep-alive.js Prevents MV3 service worker suspension
popup.html/js Extension popup showing connection status
apps/
chatgpt.com.json ChatGPT DOM model
mail.google.com.json Gmail DOM model
README.md App model schema and conventions
MIT