Back to plugins

// MCP server · browser extension · automation

AutoDOM

Turn any MCP-compatible AI agent into a browser-automation powerhouse. AutoDOM is a Node MCP server + Chromium/Firefox (Manifest V3) extension that exposes 70 browser-automation tools to GitHub Copilot, JetBrains AI Assistant, Claude Desktop, Cursor, Gemini CLI and friends — over a local WebSocket bridge with safety tiers, domain guardrails, and an in-page AI chat panel.

Role

Author & maintainer

Stack

Node · MCP (fastmcp) · MV3 extension · WebSocket

Status

Active

Year

2025

GitHub ↗ Install guide ↗

The problem

LLM agents that can only "see" the DOM through screenshots or raw HTML dumps burn enormous amounts of tokens, hit rate limits, and still make brittle decisions on modern SPAs. The other extreme — letting an agent freely drive a production browser — is a footgun waiting to happen.

I wanted a bridge between those extremes: a real Chromium session an agent can reliably automate, with structured tools instead of raw HTML, and guardrails that make it safe to leave running against a live environment.

What I built

AutoDOM ships as three coordinated pieces, all running locally:

70automation tools
3×–6×token reduction via batch tools
10 minidle auto-shutdown (configurable)
2browsers supported

Two ways to drive it

AutoDOM is built to feel useful whether you're pairing with an AI agent or running deterministic scripts you wrote yourself.

★ In-browser AI chat

Talk to the agent without leaving the page

A content-script chat panel and an inline overlay live inside every page. Bring your own provider (OpenAI, Anthropic, or a local Ollama endpoint), keys live in chrome.storage.session — RAM-only, gone on browser restart.

  • ⌘⇧K / Ctrl⇧K · chat sidebar
  • ⌘⇧L / Ctrl⇧L · inline AI overlay
  • OpenAI · Anthropic · Ollama (local)
  • Same 70 MCP tools, no IDE required
★ Local automation — no AI

Run your own scripts through the extension

A pluggable backend registry runs user-provided scripts through the bridge — no LLM, no cloud round-trip. Drop a snippet into the popup's Scripts tab, or invoke run_automation_script from any MCP client to fire a Playwright or Node script against the live tab.

  • Backends: browser-extension · playwright · node
  • Tools: list_automation_backends, validate_automation_script, run_automation_script, run_browser_script
  • Structured run output: status, logs, stdout, elapsedMs
  • Extend via server/automation/backends.js

Design decisions

A few calls I'm happy with:

Dry-run example: passing dryRun: true to batch_actions returns a risk-annotated plan — { riskLevel: "high", steps: [{ tool, tier, args }] } — without executing any of it. Agents can reason about the plan, ask the user, and then re-submit with dryRun: false once approved.

Highlights

Links