How I Built a Browser Automation Rig That Actually Works

I run an AI agent (Hermes) that posts to social media, scrapes LinkedIn, downloads YouTube videos, and does web research. All of it needs a real browser — not a headless ephemeral one, but a persistent headful Chromium with cookies, sessions, and a display.

Here’s how it’s built, what’s running inside the container, and everything that broke.

The Stack

A single Docker container running on an 8GB VPS:

┌─────────────────────────────────────────────┐
│  Docker container (browser)                  │
│                                               │
│  Xvfb (virtual display :100)                  │
│       ↓                                       │
│  Chromium (headful, port 9222)                │
│       ↓                                       │
│  x11vnc (port 5900) ← websockify (port 6080) │
│       ↓                                       │
│  noVNC ← manual browser UI                    │
│                                               │
│  nginx (port 9223 → 9222)                     │
│       CDP proxy + WebSocket URL rewrite       │
│                                               │
│  Watchdog (anon RSS monitor, health checks)   │
│                                               │
│  tini (PID 1, reaps zombies)                  │
└─────────────────────────────────────────────┘

Two ports are exposed to the host (bound to 127.0.0.1 only):

Port	What	Purpose
`9223`	nginx CDP proxy	Playwright, Puppeteer, MCP, raw websocket
`6080`	noVNC (via websockify)	Manual browser in a browser tab

Tailscale serve exposes both to my tailnet:

1
2
tailscale serve --bg --tcp 9223 127.0.0.1:9223
tailscale serve --bg --tcp 6080 127.0.0.1:6080

I can drive the browser from my laptop in a different country through <tailscale-hostname>:9223.

The Dockerfile

Not a black-box Browserless image — a custom build from debian:bookworm-slim:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
FROM debian:bookworm-slim

RUN apt-get install -y --no-install-recommends \
    chromium xvfb x11vnc novnc websockify nginx tini \
    fonts-noto-cjk fonts-noto-color-emoji

RUN useradd -m -u 1000 browser

COPY nginx.conf /etc/nginx/nginx.conf
COPY start.sh /start.sh
COPY healthcheck.sh /healthcheck.sh

USER browser
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["/start.sh"]

Key decisions:

Debian slim instead of the Browserless image — smaller attack surface, no Node.js runtime we don’t need
Chromium from Debian repos, not Google Chrome — no custom APT repo, stable version tracking
tini as PID 1 — Chromium spawns child processes (renderers, GPU, network) and tini reaps them properly so we don’t accumulate zombies
Non-root user (browser, UID 1000) — Chromium refuses to run as root anyway, and it’s the right thing to do
CJK + emoji fonts — without these, pages render with tofu boxes and bot detection triggers on font fingerprinting

docker-compose.yml — The Resource Tuning

This is the file that took the most iteration. On an 8GB host, you have to be precise:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
services:
  browser:
    build: .
    container_name: browser
    restart: unless-stopped
    mem_limit: 6144m
    memswap_limit: 7168m
    mem_reservation: 4096m
    shm_size: 4g
    ports:
      - "127.0.0.1:9223:9223"
      - "127.0.0.1:6080:6080"
    volumes:
      - ./data:/data
    environment:
      BROWSER_MEMORY_HIGH_WATERMARK_MB: 5500
      WINDOW_SIZE: 1365,840

The numbers that matter:

Setting	Value	Why
`mem_limit`	6144m (6 GB)	Chrome + Xvfb + nginx + VNC stack needs headroom for 5-10 tabs
`memswap_limit`	7168m (7 GB)	1 GB of swap — enough to absorb a temporary spike without OOM kill
`mem_reservation`	4096m (4 GB)	Guarantees Chrome has at least 4 GB before the kernel starts reclaiming
`shm_size`	4g	`/dev/shm` is where Chrome stores shared memory between processes — 64 MB default is a guaranteed crash
`BROWSER_MEMORY_HIGH_WATERMARK_MB`	5500	Watchdog threshold — above this, restart Chrome before the OOM killer does

These numbers came from watching docker stats during real usage. A headful Chromium with 3-5 tabs, a logged-in LinkedIn session, and an active CDP connection settles around 3.5-5 GB of anonymous RSS. The 5.5 GB threshold gives about 500 MB of breathing room before the 6 GB cap.

Chrome flags inside start.sh

1
2
3
4
5
6
7
8
9
chromium \
    --remote-debugging-port=9222 \
    --remote-debugging-address=127.0.0.1 \
    --user-data-dir=/data/profile \
    --max-old-space-size=1024 \
    --renderer-process-limit=8 \
    --disk-cache-size=500000000 \
    --media-cache-size=100000000 \
    --disable-features=MediaRouter,OptimizationHints,AutofillServerCommunication

remote-debugging-address=127.0.0.1 — critical. Without this, CDP only listens on localhost even inside the container
max-old-space-size=1024 — V8 heap limit per renderer. 512 MB was too small for heavy pages like LinkedIn
renderer-process-limit=8 — prevents Chrome from spawning a new renderer per tab forever
disk-cache-size=500000000 — persistent cache across restarts since /data is a bind mount
disable-features — strips out Media Router, sync, autofill, and other services that do network I/O in the background

The CDP Proxy: nginx.conf

Raw Chromium exposes CDP on port 9222. But the webSocketDebuggerUrl in /json/version looks like ws://127.0.0.1:9222/devtools/browser/... — useless for any client not on localhost. nginx fixes this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
server {
    listen 0.0.0.0:9223;

    location / {
        proxy_pass http://127.0.0.1:9222;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;

        # Rewrite WebSocket URLs to match the incoming Host header
        sub_filter "ws://127.0.0.1:9222" "$ws_scheme://$http_host";
        sub_filter "ws://localhost:9222" "$ws_scheme://$http_host";
    }
}

The sub_filter rewrites Chromium’s hardcoded WebSocket URLs to whatever host the client used to connect. This is what makes Tailscale work — a client connecting to <tailscale-hostname>:9223 gets WebSocket URLs pointing to the same hostname, not 127.0.0.1.

Without this proxy, external CDP clients get a webSocketDebuggerUrl they can’t reach, and you spend an hour debugging why Puppeteer connects but never sends commands.

The Boot Sequence: start.sh

This is the heart of the container — 347 lines of bash that brings up the whole stack in order:

1. Xvfb :100 (virtual framebuffer, 1365x840)
2. Chromium (CDP on 9222, persistent profile)
3. x11vnc (attaches to Xvfb display, listens on 5900)
4. websockify (bridges VNC → WebSocket on 6080, serves noVNC HTML)
5. nginx (CDP proxy on 9223)
6. Watchdog loop (every 60 seconds)

Each step waits for the previous one to be healthy before continuing. Chromium won’t launch until Xvfb is confirmed running (xdpyinfo). The VNC stack won’t start until Chromium is responding on /json/version. The watchdog won’t begin until every component is up.

The boot takes about 15-20 seconds from docker compose up to “Manual UI: http://127.0.0.1:6080/vnc.html”.

The Watchdog

The watchdog loop runs every 60 seconds and checks four things:

Are all processes alive? Xvfb, x11vnc, websockify, nginx — if any died, exit the container so Docker restarts it
Is Chromium running? If pgrep chromium returns nothing, launch a fresh instance
Is CDP healthy? A curl to http://127.0.0.1:9223/json/version — this catches the “WebSocket silently broken” state where HTTP works but WS doesn’t
Is memory over threshold? If anonymous RSS exceeds BROWSER_MEMORY_HIGH_WATERMARK_MB, restart Chromium

The memory check is the most interesting part:

1
2
3
4
5
container_anon_bytes() {
    # cgroup v2: read anonymous RSS from memory.stat
    # NOT memory.current — that includes reclaimable page cache
    awk '/^anon /{print $2}' /sys/fs/cgroup/memory.stat
}

Why anon and not memory.current? The cgroup memory.current counter includes the page cache — file-backed pages that the kernel can reclaim under memory pressure. If you track memory.current, legitimate file I/O (like Chromium writing to its disk cache) triggers false watchdog kills. Anonymous pages can’t be reclaimed — when they grow, it’s real memory pressure. That’s what the OOM killer acts on, so the watchdog should too.

When RAM exceeds the threshold, the watchdog:

Sends SIGTERM to all Chrome processes
Waits up to 10 seconds for graceful shutdown
Sends SIGKILL if processes are still alive
Cleans profile lock files (SingletonLock, SingletonCookie, SingletonSocket)
Launches a fresh Chromium with the same profile

The profile survives the restart because /data/profile is a bind mount on the host. All cookies, sessions, and local storage persist.

noVNC — The Manual UI

The manual access chain:

Browser tab (your laptop)
    │  http://<tailscale-hostname>:6080/vnc.html
    │
    ▼
websockify (WebSocket → raw TCP bridge)
    │  port 6080 → localhost:5900
    │
    ▼
x11vnc (VNC server attached to Xvfb display :100)
    │
    ▼
Xvfb :100 (headless X server, 1365x840, 24-bit color)
    │
    ▼
Chromium (renders into Xvfb framebuffer)

noVNC is served by websockify itself (--web /usr/share/novnc), so there’s no separate HTTP server. The entire UI is a single HTML page that connects back over WebSocket.

This is how I log into LinkedIn, X, Instagram, and Google — open the noVNC page, click through the auth flow, and close the tab. The session stays alive in the persistent profile until the watchdog restarts Chrome (which only happens under memory pressure or CDP failure).

How the AI Agent Drives It

Hermes connects through CDP for automation:

1
2
3
4
5
6
7
8
# Playwright
browser = await chromium.connect_over_cdp('http://127.0.0.1:9223')

# Puppeteer
browser = await puppeteer.connect({ browserURL: 'http://127.0.0.1:9223' })

# Raw CDP WebSocket (for video/large uploads)
# Open socket to 127.0.0.1:9223, HTTP upgrade, masked WS frames

The agent uses different tools depending on the task:

Agent Tool	CDP Method	Use Case
`browser_navigate`	`Page.navigate`	Open a URL
`browser_snapshot`	`Accessibility.getFullAXTree`	Read page content as text
`browser_click`	`Runtime.evaluate` → `element.click()`	Click buttons
`browser_type`	`Input.dispatchKeyEvent`	Type into fields
`browser_console`	`Runtime.evaluate`	Run arbitrary JS, extract data
`browser_cdp`	Any raw CDP method	Escape hatch for anything else

For social posting, the agent runs cron jobs that fire at scheduled intervals, connect to the browser over CDP, compose and publish posts, and report back.

What Broke (And How We Fixed It)

Network.setCookie, Network.setCookies, Storage.setCookies — all return success but don’t set cookies. Chrome 147 tightened the cookie security model, and CDP cookie methods are now effectively useless for session injection.

Fix: Stop injecting cookies. The container keeps a persistent authenticated profile. I log in once through noVNC, and the agent attaches to the same running browser instance. This is actually more reliable — sites detect session discontinuity and flag it as bot behavior.

CDP WebSocket silently dies after days of uptime

/json/version responds fine over HTTP, but all WebSocket protocol messages time out. Playwright, Puppeteer, raw Python — all break identically. The WebSocket upgrade returns HTTP 101, but no CDP frames are ever dispatched.

Fix: The watchdog’s /json/version health check catches HTTP failures but not this. The real fix is docker restart browser. Happens roughly once every 4-7 days.

Every platform uploads files differently

X/Twitter: Native HTMLInputElement.prototype.files setter fails for images (works for video). Fallback: simulate a DragEvent with DataTransfer onto the compose textbox. Images must be base64 ≤ 650KB, compressed through Pillow at quality 65.
Instagram: React inputs reject innerHTML and execCommand('insertText'). Only keyboard.type(text, {delay: 50}) works — character by character with a 50ms delay.
TikTok: The native prototype setter works perfectly. files.length is correct. The most automation-friendly of all platforms.
Threads: Same React issue as Instagram. Drag-and-drop on the compose textbox is the only reliable path.

nginx frame size cap breaks large uploads

nginx caps WebSocket frames at roughly 186KB. Uploading a video means splitting the base64 into chunks and sending multiple Runtime.evaluate frames. The Python implementation must use sock.sendall() — sock.send() can return before the full buffer is transmitted, and partial WebSocket frames are garbage.

Tailscale serve syntax changed between versions

Tailscale 1.96.x uses --tcp flag syntax:

1
2
3
4
5
# Old (broken on 1.96.x)
tailscale serve --bg tcp 9223 tcp://127.0.0.1:9223

# New (working)
tailscale serve --bg --tcp 9223 127.0.0.1:9223

The Full File Layout

Everything lives in ~/browser/:

~/browser/
  Dockerfile          # Build from debian:bookworm-slim
  docker-compose.yml  # Resource limits, port bindings, volumes
  nginx.conf          # CDP proxy with WebSocket URL rewrite
  start.sh            # Boot sequence + watchdog
  healthcheck.sh      # Curls CDP + noVNC, checks both respond
  data/               # Bind-mounted persistent volume
    profile/           # Chromium user profile (cookies, sessions, cache)
    browser-logs/      # stdout/stderr from all processes

Rebuild and start:

1
2
cd ~/browser
docker compose up -d --build

The container binds data to ./data on the host. That means:

~/browser/data/profile/ — Chromium profile survives container rebuilds
~/browser/data/browser-logs/ — full logs from Chrome, Xvfb, nginx, VNC

What’s Next

The rig is stable. It posts, scrapes, downloads, and monitors. But there’s more to do:

Multi-profile: Separate Chrome profiles per platform so a rate limit or shadow-ban on one doesn’t take down the others
CDP WebSocket health probe: The current /json/version check doesn’t catch the silent WebSocket failure. Needs an actual Runtime.evaluate round-trip
Residential proxy rotation: Some platforms geofence or rate-limit by IP
CAPTCHA automation: Currently solved manually through noVNC

For now, it works — and that’s enough.

This browser runs in Docker on an 8GB VPS, connected to Hermes (my AI agent) via Chrome DevTools Protocol. Everything described here is running in production as of May 2026. If you’re building something similar, I’ve probably hit whatever bug you’re currently debugging.

The Stack#

The Dockerfile#

docker-compose.yml — The Resource Tuning#

Chrome flags inside start.sh#

The CDP Proxy: nginx.conf#

The Boot Sequence: start.sh#

The Watchdog#

noVNC — The Manual UI#

How the AI Agent Drives It#

What Broke (And How We Fixed It)#

Chrome 147 broke all cookie injection#

CDP WebSocket silently dies after days of uptime#

Every platform uploads files differently#

nginx frame size cap breaks large uploads#

Tailscale serve syntax changed between versions#

The Full File Layout#

What’s Next#