Multi-Instance Watchtower — Research Meeting Minutes

Date: 2026-02-17
Meeting start: 00:25 EST | Meeting end:
Duration:
Attendees: Phil (product owner), CoS (facilitator), SDK Researcher, Discord Researcher, Infra/Resource Analyst, Brain DB Analyst

1. SDK Researcher — Multi-Instance Feasibility

Verdict: FULLY SUPPORTED

Resource per instance

ComponentConsumption
Node.js subprocess150-500MB RAM (idle to active)
File descriptors3+ (stdin/stdout/stderr pipes)
Python bridge overhead~50MB per worker
API connectionIndependent HTTP per subprocess

2. Discord Researcher — Channel Management

Verdict: STRAIGHTFORWARD

Recommended pattern

3. Infra/Resource Analyst — Deadpool Capacity & Clone Model

Verdict: FEASIBLE — 8-10 concurrent workers on Deadpool

Memory budget (32GB)

ComponentCost
Windows 10 + other processes~6 GB
WSL2 kernel overhead~0.5 GB
Main Watchtower process~0.15 GB
Available for workers~25 GB
Per worker (Python bridge + Node.js)~0.5-1 GB
Safe concurrent workers8-10
Real bottleneck: Anthropic API rate limits (typically 5-10 concurrent requests per tier), not local resources.

Clone-based workspaces

Cleanup

Conflict handling — recommended approach

WSL2 prep needed

4. Brain DB Analyst — Shared State & Concurrent Writes

Verdict: BUILD WHEN NEEDED — not now, but architect for it

The real conflict risk

Recommended phased approach

  1. Phase 1 (ship with multi-instance): "CoS owns THREADS.md" rule. Worker instances run in a restricted mode — no writes to brain/. This is basically extending !secret mode to all workers. Zero new infrastructure needed.
  2. Phase 2 (when pain is real): SQLite DB in WAL mode at brain/brain.db.

What stays as files forever

Key insight: Don't build the DB until multi-instance is live and you feel the pain. "CoS owns THREADS.md" is sufficient for launch.

5. Architecture Summary

Watchtower starts → CoS session in main channel (primary working copy)
    │
    Phil: "spin up a worker for CzechWriter"
    │
    ▼
CoS checks: resources OK? repo locked? → YES
    │
    ├── Creates Discord channel: #worker-czechwriter
    ├── Clones repo to /home/plangeberg/watchtower-workers/wt-001/
    ├── Spawns new ClaudeSDKClient(cwd=clone_path)
    ├── Registers channel_id → worker in routing dict
    └── Tells Phil: "Channel ready, go talk to it"

Phil switches to #worker-czechwriter → works directly with that CC instance
    │
    Done → Phil says "end session" or tells CoS to kill it
    │
    ▼
Cleanup:
    ├── Worker commits to branch worker/wt-001
    ├── Worker pushes branch
    ├── Clone directory deleted
    ├── Discord channel deleted (or archived)
    └── CoS merges branch to main (or Phil reviews first)

6. Proposed Tickets

EPIC 1: Multi-Instance Core (MVP)
Estimated: 3-4 sessions
WT-001 Worker registry and lifecycle manager
Track active workers, their channels, repos, branches, PIDs. Spawn/kill operations. Orphan cleanup on startup.
WT-002 Multi-channel Discord routing
Refactor on_message from single-channel to registry-based routing. Each worker channel routes to its own ClaudeBridge. CoS channel keeps existing commands.
WT-003 Dynamic channel creation/deletion
Bot creates channels under "Watchtower Workers" category on spawn. Deletes on cleanup. Phil-only permissions.
WT-004 Clone-based workspace management
Git clone to Linux FS, shallow clone, worker branch naming, commit+push on completion, directory cleanup. Consider bare-mirror optimization.
WT-005 CoS commands: !spawn, !workers, !kill
!spawn czechwriter — creates worker. !workers — lists active. !kill wt-001 — terminates worker + cleanup.
WT-006 Resource guard
Check available memory + active worker count before spawning. Configurable max workers (default 4). Deny spawn with reason if limit hit.
WT-007 Default to CoS on boot
Watchtower auto-starts a CoS session in the main channel on startup. Currently requires !cos.
EPIC 2: Worker Isolation & Safety
Estimated: 1-2 sessions
WT-008 Worker restricted mode (no brain writes)
Workers get a preamble similar to !secret — no writes to brain/, memory/, THREADS.md. Only CoS touches shared state.
WT-009 Per-worker settings file
Generate a worker-specific watchtower-settings-wt-001.json scoped to the clone directory. Prevents accidental access to other repos.
WT-010 WSL2 environment prep script
Script to set inotify limits, .wslconfig memory cap, verify ulimit -n. Run once before first multi-instance use.
EPIC 3: Brain DB (Deferred — Build When Needed)
Estimated: 2 sessions
WT-011 THREADS.md → SQLite migration script
Parse THREADS.md, seed DB tables (threads, todos, parking_lot, review_items). Validate by round-trip diff.
WT-012 brain-db CLI tool
Python CLI: query/update threads, todos, parking lot. Auto-renders THREADS.md after writes. CC calls via Bash.
WT-013 Update Watchtower todo.py to use brain-db
Swap file I/O for subprocess calls to brain-db. Same Discord interface for Phil.
WT-014 Update CHIEF-OF-STAFF.md for DB workflow
Tell CC to use brain-db CLI instead of editing THREADS.md directly.

7. Open Questions for Phil

  1. Max concurrent workers: Default to 4? Or let it float based on memory?
  2. Worker branch merging: CoS auto-merges to main? Or Phil reviews the branch first?
  3. Channel persistence: Delete channels when worker ends? Or keep for history (collapse into archive category)?
  4. Repo mapping: Should CoS know which repos map to which project names? (e.g., "CzechWriter" → czechsuma-labs/czechwriter) Or does Phil specify the path?
  5. Naming: The app is still called Watchtower (pending rename). Do these tickets go into the existing Watchtower backlog, or does multi-instance warrant its own project name?

8. Risks & Mitigations

RiskSeverityMitigation
Anthropic API rate limits throttle multiple workersHighResource guard caps workers; stagger API-heavy operations
Worker crashes leave orphaned clones/channelsMediumStartup cleanup + worker.json metadata + periodic health check
Two workers edit same file across reposMediumWorker branches prevent push conflicts; CoS resolves at merge
WSL2 inotify exhaustionLowWT-010 prep script raises limits
Write-after-Allow bug (existing) blocks worker permissionsHighMust fix existing bug (WT backlog) before multi-instance ships