Multi-Instance Watchtower — Research Meeting Minutes
Date: 2026-02-17
Meeting start: 00:25 EST | Meeting end: —
Duration: —
Attendees: Phil (product owner), CoS (facilitator), SDK Researcher, Discord Researcher, Infra/Resource Analyst, Brain DB Analyst
1. SDK Researcher — Multi-Instance Feasibility
Verdict: FULLY SUPPORTED
- No singletons or global state in the SDK. Each
ClaudeSDKClient is independent — own transport, own subprocess, own message stream.
- Process model: Each client spawns its own
claude Node.js subprocess via anyio.open_process(). Communication is JSON over stdin/stdout pipes.
- Working directory isolation: Each subprocess gets its own
cwd — fully isolated at the OS level.
- Settings per instance: The
--settings CLI flag is per-subprocess. Each worker can use a different settings file.
- Concurrent asyncio: Multiple clients can run
query() concurrently in the same event loop. Each has its own TaskGroup and message stream.
- One minor wart: A cosmetic env var (
CLAUDE_CODE_ENTRYPOINT) gets mutated globally, but it's harmless — same value written by all clients.
- No nesting issue: Workers are spawned by the Python coordinator, not by CC itself, so the
CLAUDECODE env var detection doesn't fire.
Resource per instance
| Component | Consumption |
| Node.js subprocess | 150-500MB RAM (idle to active) |
| File descriptors | 3+ (stdin/stdout/stderr pipes) |
| Python bridge overhead | ~50MB per worker |
| API connection | Independent HTTP per subprocess |
2. Discord Researcher — Channel Management
Verdict: STRAIGHTFORWARD
- Bot can create channels:
guild.create_text_channel(name, category, overwrites). Needs manage_channels permission.
- Categories supported: Create a "Watchtower Workers" category, nest worker channels inside it. Clean sidebar UX.
- Bot can delete channels:
channel.delete() — same permission. Cleanup on session end.
- Per-channel permissions: Overwrites at creation — lock to Phil-only and the bot. Everyone else denied.
- Multi-channel routing: Single
on_message handler + a dict[channel_id → worker] registry. No separate listeners needed.
- Channel naming: 1-100 chars, lowercase + hyphens. Discord does NOT enforce uniqueness (use IDs as keys).
- Rate limits: ~2 channel creates per 10 seconds. Irrelevant for spawning a handful of workers.
- Hard limit: 500 channels per guild. Ephemeral pattern (create on start, delete on end) keeps this clean.
Recommended pattern
- Channels are ephemeral — created when worker starts, deleted when worker ends.
- Named
worker-<short-id> or worker-czechwriter for readability.
- If Phil wants history, keep a persistent log channel for session summaries.
3. Infra/Resource Analyst — Deadpool Capacity & Clone Model
Verdict: FEASIBLE — 8-10 concurrent workers on Deadpool
Memory budget (32GB)
| Component | Cost |
| Windows 10 + other processes | ~6 GB |
| WSL2 kernel overhead | ~0.5 GB |
| Main Watchtower process | ~0.15 GB |
| Available for workers | ~25 GB |
| Per worker (Python bridge + Node.js) | ~0.5-1 GB |
| Safe concurrent workers | 8-10 |
Real bottleneck: Anthropic API rate limits (typically 5-10 concurrent requests per tier), not local resources.
Clone-based workspaces
- Clone location: Linux filesystem at
/home/plangeberg/watchtower-workers/<worker-id>/ — substantially faster than /mnt/d/ (5-15x for git ops).
- Clone scope: Just the needed subrepo (e.g.,
czechsuma-labs/czechwriter). Use SDK's --add-dir flag for read access to brain/ context if needed.
- Shallow clones:
git clone --depth 1 keeps disk usage minimal (~10-50MB per worker).
- Best option: Bare clone mirror on Linux FS +
git worktree add for workers. Faster than network clones, shared object store.
Cleanup
- Normal: commit → push →
rm -rf worker dir.
- Crash: On startup, scan worker dirs for orphans (check PIDs).
worker.json metadata per worker.
Conflict handling — recommended approach
- Worker branches: Each worker commits to
worker/<id>. No push collisions possible.
- CoS (or Phil) merges worker branches into
main after review.
- Alternative: Lock one worker per repo (simpler, less parallel).
WSL2 prep needed
- Raise
fs.inotify.max_user_watches to 524288 (default 8192 will exhaust with multiple Node.js processes).
- Set
.wslconfig: memory=20GB, swap=4GB — leaves 12GB for Windows.
4. Brain DB Analyst — Shared State & Concurrent Writes
Verdict: BUILD WHEN NEEDED — not now, but architect for it
The real conflict risk
- 95% of the risk is one file: THREADS.md — specifically the todo list and thread status sections.
- Everything else (PHIL.md, contexts, sessions, handoffs, runbooks) is either read-only, append-only, or session-scoped. No concurrency risk.
Recommended phased approach
- Phase 1 (ship with multi-instance): "CoS owns THREADS.md" rule. Worker instances run in a restricted mode — no writes to brain/. This is basically extending
!secret mode to all workers. Zero new infrastructure needed.
- Phase 2 (when pain is real): SQLite DB in WAL mode at
brain/brain.db.
- Tables:
threads, todos, parking_lot, review_items, delegated
- WAL mode allows concurrent reads + serialized writes with auto-retry
- Python CLI wrapper (
brain-db) — CC calls it via Bash instead of editing THREADS.md
- Auto-renders THREADS.md as a read-only artifact after every write (Phil's view doesn't change)
- Migration script parses existing THREADS.md → DB rows (~150 lines)
- CLI tool ~200 lines. Minimal command set:
brain-db todos, brain-db todo-done "text", etc.
What stays as files forever
- PHIL.md, CHIEF-OF-STAFF.md, contexts/* — read-only for CC, rare human edits
- Sessions/* — append-only, one file per session
- Handoffs — create-once, consume-once lifecycle
- Runbooks, parking-lot detail files — reference docs, not concurrent data
Key insight: Don't build the DB until multi-instance is live and you feel the pain. "CoS owns THREADS.md" is sufficient for launch.
5. Architecture Summary
Watchtower starts → CoS session in main channel (primary working copy)
│
Phil: "spin up a worker for CzechWriter"
│
▼
CoS checks: resources OK? repo locked? → YES
│
├── Creates Discord channel: #worker-czechwriter
├── Clones repo to /home/plangeberg/watchtower-workers/wt-001/
├── Spawns new ClaudeSDKClient(cwd=clone_path)
├── Registers channel_id → worker in routing dict
└── Tells Phil: "Channel ready, go talk to it"
Phil switches to #worker-czechwriter → works directly with that CC instance
│
Done → Phil says "end session" or tells CoS to kill it
│
▼
Cleanup:
├── Worker commits to branch worker/wt-001
├── Worker pushes branch
├── Clone directory deleted
├── Discord channel deleted (or archived)
└── CoS merges branch to main (or Phil reviews first)
6. Proposed Tickets
EPIC 1: Multi-Instance Core (MVP)
Estimated: 3-4 sessions
WT-001
Worker registry and lifecycle manager
Track active workers, their channels, repos, branches, PIDs. Spawn/kill operations. Orphan cleanup on startup.
WT-002
Multi-channel Discord routing
Refactor on_message from single-channel to registry-based routing. Each worker channel routes to its own ClaudeBridge. CoS channel keeps existing commands.
WT-003
Dynamic channel creation/deletion
Bot creates channels under "Watchtower Workers" category on spawn. Deletes on cleanup. Phil-only permissions.
WT-004
Clone-based workspace management
Git clone to Linux FS, shallow clone, worker branch naming, commit+push on completion, directory cleanup. Consider bare-mirror optimization.
WT-005
CoS commands: !spawn, !workers, !kill
!spawn czechwriter — creates worker. !workers — lists active. !kill wt-001 — terminates worker + cleanup.
WT-006
Resource guard
Check available memory + active worker count before spawning. Configurable max workers (default 4). Deny spawn with reason if limit hit.
WT-007
Default to CoS on boot
Watchtower auto-starts a CoS session in the main channel on startup. Currently requires !cos.
EPIC 2: Worker Isolation & Safety
Estimated: 1-2 sessions
WT-008
Worker restricted mode (no brain writes)
Workers get a preamble similar to !secret — no writes to brain/, memory/, THREADS.md. Only CoS touches shared state.
WT-009
Per-worker settings file
Generate a worker-specific watchtower-settings-wt-001.json scoped to the clone directory. Prevents accidental access to other repos.
WT-010
WSL2 environment prep script
Script to set inotify limits, .wslconfig memory cap, verify ulimit -n. Run once before first multi-instance use.
EPIC 3: Brain DB (Deferred — Build When Needed)
Estimated: 2 sessions
WT-011
THREADS.md → SQLite migration script
Parse THREADS.md, seed DB tables (threads, todos, parking_lot, review_items). Validate by round-trip diff.
WT-012
brain-db CLI tool
Python CLI: query/update threads, todos, parking lot. Auto-renders THREADS.md after writes. CC calls via Bash.
WT-013
Update Watchtower todo.py to use brain-db
Swap file I/O for subprocess calls to brain-db. Same Discord interface for Phil.
WT-014
Update CHIEF-OF-STAFF.md for DB workflow
Tell CC to use brain-db CLI instead of editing THREADS.md directly.
7. Open Questions for Phil
- Max concurrent workers: Default to 4? Or let it float based on memory?
- Worker branch merging: CoS auto-merges to main? Or Phil reviews the branch first?
- Channel persistence: Delete channels when worker ends? Or keep for history (collapse into archive category)?
- Repo mapping: Should CoS know which repos map to which project names? (e.g., "CzechWriter" →
czechsuma-labs/czechwriter) Or does Phil specify the path?
- Naming: The app is still called Watchtower (pending rename). Do these tickets go into the existing Watchtower backlog, or does multi-instance warrant its own project name?
8. Risks & Mitigations
| Risk | Severity | Mitigation |
| Anthropic API rate limits throttle multiple workers | High | Resource guard caps workers; stagger API-heavy operations |
| Worker crashes leave orphaned clones/channels | Medium | Startup cleanup + worker.json metadata + periodic health check |
| Two workers edit same file across repos | Medium | Worker branches prevent push conflicts; CoS resolves at merge |
| WSL2 inotify exhaustion | Low | WT-010 prep script raises limits |
| Write-after-Allow bug (existing) blocks worker permissions | High | Must fix existing bug (WT backlog) before multi-instance ships |