The problem
LLM agents that can only "see" the DOM through screenshots or raw HTML dumps burn enormous amounts of tokens, hit rate limits, and still make brittle decisions on modern SPAs. The other extreme — letting an agent freely drive a production browser — is a footgun waiting to happen.
I wanted a bridge between those extremes: a real Chromium session an agent can reliably automate, with structured tools instead of raw HTML, and guardrails that make it safe to leave running against a live environment.
What I built
AutoDOM ships as three coordinated pieces, all running locally:
- Node MCP server (
server/, built onfastmcp) that speaks MCP overstdioto the IDE and proxies tool calls to the browser over a local WebSocket on127.0.0.1:9876. - Manifest V3 browser extension (Chromium + Firefox flavors) with a service worker bridge client, content scripts that host the AI chat panel and inline overlay, and a popup for connection status and per-port controls.
- Local automation backends — a registry of script runners (Playwright, Node) so users can drop in their own browser scripts via the popup's Scripts tab or invoke them as MCP tools, with no AI required.
- Guardrail layer that classifies every tool as
read,write, ordestructiveand enforces domain allowlists/blocklists, per-tool timeouts, and a confirm-before-execute flow.
Two ways to drive it
AutoDOM is built to feel useful whether you're pairing with an AI agent or running deterministic scripts you wrote yourself.
Talk to the agent without leaving the page
A content-script chat panel and an inline overlay live inside every page.
Bring your own provider (OpenAI, Anthropic, or a local Ollama endpoint),
keys live in chrome.storage.session — RAM-only, gone on
browser restart.
- ⌘⇧K / Ctrl⇧K · chat sidebar
- ⌘⇧L / Ctrl⇧L · inline AI overlay
- OpenAI · Anthropic · Ollama (local)
- Same 70 MCP tools, no IDE required
Run your own scripts through the extension
A pluggable backend registry runs user-provided scripts through the
bridge — no LLM, no cloud round-trip. Drop a snippet into the popup's
Scripts tab, or invoke run_automation_script from any
MCP client to fire a Playwright or Node script against the live tab.
- Backends:
browser-extension·playwright·node - Tools:
list_automation_backends,validate_automation_script,run_automation_script,run_browser_script - Structured run output:
status,logs,stdout,elapsedMs - Extend via
server/automation/backends.js
Design decisions
A few calls I'm happy with:
- Local-only transport. Stdio MCP between IDE and server, a loopback WebSocket between server and extension. Nothing leaves the machine unless the user wires up an external AI provider.
- Per-start auth token for the WebSocket bridge, so even on a shared workstation random clients can't snoop the session.
- Safety tiers over permissions. Every tool declares its tier, so new tools inherit the right policy automatically — no out-of-band config drift.
- Dry-run planner on
batch_actions. An agent can preview a multi-step plan with per-step tier + overallriskLevelbefore touching the page. - History-API based SPA detection instead of a global
MutationObserver, which dropped idle CPU dramatically on long-lived sessions. - Ring-buffer tool logs and cached keepalives so the extension stays lightweight even when the agent is bursty.
- Server-authoritative inactivity timer. The server is the only
source of truth for
SESSION_TIMEOUT/INACTIVITY_WARNING— no clock drift between extension and bridge.
dryRun: true to
batch_actions returns a risk-annotated plan —
{ riskLevel: "high", steps: [{ tool, tier, args }] } — without
executing any of it. Agents can reason about the plan, ask the user, and then
re-submit with dryRun: false once approved.
Highlights
- 🛡️ Safety tiers, domain allow/blocklists, confirm mode, per-start auth token
- 🚀 Token-efficient tools:
execute_code,get_dom_state,batch_actions,extract_data - 💬 In-browser chat panel (
⌘⇧K) and inline AI overlay (⌘⇧L) — drive the agent without leaving the page - 🧩 Local Playwright / Node script runners — bring your own automations,
no AI required (see
AUTOMATION.md) - ⏱️ Configurable inactivity auto-shutdown with pre-timeout warning
- 🌐 Stdio MCP ↔
ws://127.0.0.1:9876bridge (port configurable via--portfor multi-browser setups) - ⚡ Lightweight: history-API SPA detection, ring-buffered logs, cached keepalives
- 🛠️ One-shot
./setup.shauto-detects installed IDEs (IntelliJ family, VS Code, Cursor, Claude Desktop, Gemini CLI) and writes the MCP config for each