AutoDOM

The problem

LLM agents that can only "see" the DOM through screenshots or raw HTML dumps burn enormous amounts of tokens, hit rate limits, and still make brittle decisions on modern SPAs. The other extreme — letting an agent freely drive a production browser — is a footgun waiting to happen.

I wanted a bridge between those extremes: a real Chromium session an agent can reliably automate, with structured tools instead of raw HTML, and guardrails that make it safe to leave running against a live environment.

What I built

AutoDOM ships as three coordinated pieces, all running locally:

Node MCP server (server/, built on fastmcp) that speaks MCP over stdio to the IDE and proxies tool calls to the browser over a local WebSocket on 127.0.0.1:9876.
Manifest V3 browser extension (Chromium + Firefox flavors) with a service worker bridge client, content scripts that host the AI chat panel and inline overlay, and a popup for connection status and per-port controls.
Local automation backends — a registry of script runners (Playwright, Node) so users can drop in their own browser scripts via the popup's Scripts tab or invoke them as MCP tools, with no AI required.
Guardrail layer that classifies every tool as read, write, or destructive and enforces domain allowlists/blocklists, per-tool timeouts, and a confirm-before-execute flow.

70automation tools

3×–6×token reduction via batch tools

10 minidle auto-shutdown (configurable)

2browsers supported

Two ways to drive it

AutoDOM is built to feel useful whether you're pairing with an AI agent or running deterministic scripts you wrote yourself.

★ In-browser AI chat

Talk to the agent without leaving the page

A content-script chat panel and an inline overlay live inside every page. Bring your own provider (OpenAI, Anthropic, or a local Ollama endpoint), keys live in chrome.storage.session — RAM-only, gone on browser restart.

⌘⇧K / Ctrl⇧K · chat sidebar
⌘⇧L / Ctrl⇧L · inline AI overlay
OpenAI · Anthropic · Ollama (local)
Same 70 MCP tools, no IDE required

★ Local automation — no AI

Run your own scripts through the extension

A pluggable backend registry runs user-provided scripts through the bridge — no LLM, no cloud round-trip. Drop a snippet into the popup's Scripts tab, or invoke run_automation_script from any MCP client to fire a Playwright or Node script against the live tab.

Backends: browser-extension · playwright · node
Tools: list_automation_backends, validate_automation_script, run_automation_script, run_browser_script
Structured run output: status, logs, stdout, elapsedMs
Extend via server/automation/backends.js

Design decisions

A few calls I'm happy with:

Local-only transport. Stdio MCP between IDE and server, a loopback WebSocket between server and extension. Nothing leaves the machine unless the user wires up an external AI provider.
Per-start auth token for the WebSocket bridge, so even on a shared workstation random clients can't snoop the session.
Safety tiers over permissions. Every tool declares its tier, so new tools inherit the right policy automatically — no out-of-band config drift.
Dry-run planner on batch_actions. An agent can preview a multi-step plan with per-step tier + overall riskLevel before touching the page.
History-API based SPA detection instead of a global MutationObserver, which dropped idle CPU dramatically on long-lived sessions.
Ring-buffer tool logs and cached keepalives so the extension stays lightweight even when the agent is bursty.
Server-authoritative inactivity timer. The server is the only source of truth for SESSION_TIMEOUT / INACTIVITY_WARNING — no clock drift between extension and bridge.

Dry-run example: passing dryRun: true to batch_actions returns a risk-annotated plan — { riskLevel: "high", steps: [{ tool, tier, args }] } — without executing any of it. Agents can reason about the plan, ask the user, and then re-submit with dryRun: false once approved.

Highlights

🛡️ Safety tiers, domain allow/blocklists, confirm mode, per-start auth token
🚀 Token-efficient tools: execute_code, get_dom_state, batch_actions, extract_data
💬 In-browser chat panel (⌘⇧K) and inline AI overlay (⌘⇧L) — drive the agent without leaving the page
🧩 Local Playwright / Node script runners — bring your own automations, no AI required (see AUTOMATION.md)
⏱️ Configurable inactivity auto-shutdown with pre-timeout warning
🌐 Stdio MCP ↔ ws://127.0.0.1:9876 bridge (port configurable via --port for multi-browser setups)
⚡ Lightweight: history-API SPA detection, ring-buffered logs, cached keepalives
🛠️ One-shot ./setup.sh auto-detects installed IDEs (IntelliJ family, VS Code, Cursor, Claude Desktop, Gemini CLI) and writes the MCP config for each