# INFRA.md

**The operational contract between this kit and the Mahana agent fabric.**

This file is the source of truth for: where the queue lives, how the daemon spawns workers, how components ship to customers, and the one-way door between "lifted in kit" → "running in production".

Last verified: 2026-04-22 — after first successful end-to-end (`88e1ca3` / `65484a9`, Grok worker, 6min spawn→deploy).

---

## 1. The fabric, at a glance

```
┌───────────────────────────┐
│  This repo (the kit)      │   design system + playground + brains
│  itonsberg/Mastermind-v2  │
└──────────────┬────────────┘
               │  lifts / promotes
               ▼
┌───────────────────────────────────────────────┐
│  nordvest-bygginnredning  (first customer)    │
│  Live Chat med Agent  ←  Be om endring  form  │
└──────────────┬────────────────────────────────┘
               │  INSERT into nordvest_task_queue
               ▼
┌───────────────────────────────────────────────┐
│  Supabase (gyz project)                       │
│  nordvest_task_queue  (isolated from fleet)   │
└──────────────┬────────────────────────────────┘
               │  polled every 20s
               ▼
┌───────────────────────────────────────────────┐
│  m3-council tmux (local M3)                   │
│  ├─ nv-daemon         (poller)                │
│  ├─ claude-opus       (council member)        │
│  ├─ kimi              (council member)        │
│  └─ grok              (council member)        │
└──────────────┬────────────────────────────────┘
               │  spawns worker per task
               ▼
┌───────────────────────────────────────────────┐
│  nordvest-agents tmux                         │
│  Grok 4.1 fast via translator-proxy           │
│  (bypasses Anthropic rate cap)                │
└──────────────┬────────────────────────────────┘
               │  git commit + push
               ▼
┌───────────────────────────────────────────────┐
│  Vercel (obratech team)                       │
│  v0-nordvest-bygginnredningx.vercel.app       │
└───────────────────────────────────────────────┘
```

## 2. Two Supabase projects — don't confuse them

| Project | URL | Role | What lives there |
|---|---|---|---|
| **gyz** | `https://gyzgudmzjxgialddmowe.supabase.co` | Fabric control plane | `nordvest_task_queue`, `agent_chat_messages`, `mahana_task_queue` (fleet — AVOID), eventually `mahana_kit_task_queue` |
| **mpazpdibfxafwbrcppuc** | `https://mpazpdibfxafwbrcppuc.supabase.co` | Nordvest's own app DB (being phased out for chat) | Legacy `agent_chat` rows, session state, customer data. Realtime WS lives here (failing, poll fallback covers it). |

**Rule:** the kit and the daemon talk to **gyz**. Nordvest's *legacy* app DB is on its own project. Crossing these wires is how you get the fleet stealing your tasks or the daemon writing into the wrong schema.

**Chat migration in-flight (2026-04-22):** `agent_chat_messages` was created on **gyz** with RLS + realtime publication. `NEXT_PUBLIC_MAHANA_SUPABASE_URL` + `NEXT_PUBLIC_MAHANA_SUPABASE_ANON_KEY` are set in Vercel. Still pending: patch `app/api/agent-chat/route.ts` + `components/agent-chat.tsx` to prefer Mahana vars → commit → verify new chat traffic lands on gyz. After that: `mpazpdibfxafwbrcppuc` becomes legacy-only.

**Anon key (gyz, open RLS on `nordvest_task_queue` + `agent_chat_messages`):**
```
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Imd5emd1ZG16anhnaWFsZGRtb3dlIiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTM2MDM5NTcsImV4cCI6MjA2OTE3OTk1N30.IZ0GrFqTN195H0SGgr-DBfWQ-DSLd3qN17kl_xxSgeA
```
INSERT + SELECT scoped by RLS. Safe in browser. Do not commit to source.

## 3. `nordvest_task_queue` shape

```sql
create table nordvest_task_queue (
  id            uuid primary key default gen_random_uuid(),
  created_at    timestamptz default now(),
  status        text not null,      -- pending | claimed | done | cancelled | failed
  priority      text default 'normal',  -- low | normal | high
  source        text,               -- 'be-om-endring' | 'live-chat' | 'kit-lift'
  session_id    text,               -- links to agent-chat thread (Nordvest supabase)
  request       text not null,      -- Norwegian prose from customer
  context       jsonb,              -- { page, screenshot_url, component_path, ... }
  claimed_by    text,               -- worker window name
  claimed_at    timestamptz,
  done_at       timestamptz,
  result        jsonb,              -- { commits: [sha], deploy_url, messages: [...] }
  error         text
);
```

**Isolation rule (learned the hard way):** the M4 fleet polls `mahana_task_queue` — do NOT put Nordvest or kit tasks there. The fleet will claim them and produce garbage. Stay in `nordvest_task_queue` (customer work) or `mahana_kit_task_queue` (future, kit internal work).

## 4. Daemon contract

**Location:** `m3-council:nv-daemon` tmux window, PID tracked in session.
**Backend:** Grok 4.1 fast via translator-proxy (`~/.config/translator-proxy/` — bypasses Anthropic 100k/day cap).
**Poll interval:** 20s.

**Claim protocol:**
1. `SELECT * FROM nordvest_task_queue WHERE status='pending' ORDER BY priority DESC, created_at ASC LIMIT 1`
2. `UPDATE ... SET status='claimed', claimed_by='<window>', claimed_at=now() WHERE id=? AND status='pending'` (optimistic)
3. Spawn worker in `nordvest-agents:<task-id-short>` tmux window
4. Worker receives: request + context + MCP config + CLAUDE.md

**Worker lifecycle:**
- Reads the Nordvest repo locally (checked out at known path)
- Edits files, runs whatever build/lint it needs
- `git commit -m "<norwegian description>"` with author `fredrik@obratech.no` (required for Vercel obratech team)
- `git push origin main` → Vercel auto-deploys
- `POST /api/agent-chat` with DONE message + deploy URL
- `UPDATE nordvest_task_queue SET status='done', done_at=now(), result={...}`

**Known issues, intentionally not fixed:**
- WebSocket realtime to Nordvest's Supabase (`mpazpdibfxafwbrcppuc`) fails → poll fallback at 3s. Cosmetic; chat still works.
- Worker doesn't poll for customer replies (fire-and-forget). Patch = ~15 LOC in daemon template if we want back-and-forth.

## 5. Bugs we've already eaten (don't re-introduce)

| Bug | Fix |
|---|---|
| Vercel auth: commits by `fredrik@itonsberg.no`, team is `obratech` | Git author must be `fredrik@obratech.no` |
| `set -e` killed daemon on benign python stderr | Removed `set -e`, explicit `\|\| log error` wrappers |
| zsh `echo "$json" \| python3` mangled `\n` | Use `printf '%s'` |
| macOS `mktemp` doesn't allow extensions after X's | Template must end in `.XXXXXX` only |
| MCP tool explosion (1704 tools) | Spawn worker with `--strict-mcp-config` |
| Fleet stole Nordvest tasks | Dedicated `nordvest_task_queue`, not shared |

## 6. The kit's place in this fabric

**The kit (this repo) is the UI layer of the fabric, not a sidecar.**

Every component lifted in the kit ships to Nordvest's `Live Chat med Agent` first (the staging bench), gets beaten up by real Norwegian prose from real customers, and *then* gets promoted to canonical status in `mahana-mastermind-chat`.

```
component lifted in kit
        │
        ▼
 playground variant coverage ✓
        │
        ▼
 ships into Nordvest's LiveChat
        │
        ▼  real traffic for ≥N days
        │
        ▼
 promoted → mastermind-v2/components/
        │
        ▼
 PROMOTIONS.md entry + GH issue
```

**Corollary:** for now we do NOT need a separate `mahana_kit_task_queue`. Nordvest's queue is the kit's production integration. Revisit when a second customer exists.

## 7. Environment variables (for anyone wiring a new surface to the queue)

```bash
# gyz project — the fabric's control plane
SUPABASE_URL=<gyz-url>
SUPABASE_ANON_KEY=<gyz-anon-key>  # RLS scoped to nordvest_task_queue insert + select own

# Nordvest's own project — only if talking to agent_chat
NORDVEST_SUPABASE_URL=https://mpazpdibfxafwbrcppuc.supabase.co
NORDVEST_SUPABASE_ANON_KEY=<redacted>
```

Never commit either anon key to the kit repo — they're scoped but still public URLs. Keep them in Vercel env or local `.env.local`.

## 8. Promotion path (kit → mastermind-v2)

1. Component lifted in kit, variants covered in playground, brain + entity + spec + skill written.
2. Ships into Nordvest LiveChat via `components/LiveChat/` lift — see `skills/kit-lift.skill.md`.
3. Runs against real traffic. Council reviews verdicts monthly.
4. If stable: entry added to `PROMOTIONS.md` with status `ready`, GH issue opened on `mastermind-v2`.
5. Claude Code picks it up, implements in canonical repo, marks `landed`.

## 9. Open threads

- [ ] Patch daemon with reply-loop for back-and-forth clarifications (~15 LOC)
- [ ] Fix Nordvest Supabase WebSocket (or explicitly disable realtime, rely on 3s poll)
- [ ] Decide: when does `mahana_kit_task_queue` become necessary? (trigger: 2nd customer, OR kit-internal tasks like "lift next component" becoming common enough to warrant their own pipe)
- [ ] Brain/skill injection into worker context (right now worker reads CLAUDE.md; should it also auto-read the relevant `components/<name>/brain.md` when editing that component?)
- [ ] **Chat migration finish:** patch `app/api/agent-chat/route.ts` + `components/agent-chat.tsx` to prefer `NEXT_PUBLIC_MAHANA_*` env, commit, verify new traffic lands on gyz `agent_chat_messages`.

## 10. Repo consolidation (kit → Mastermind-v2)

This design-system repo (262 files, 46M) is **4.8× richer** than the original `mastermind-v2` (101 files, 9.6M). The old repo has the deploy pipeline wired; this repo has the content. Recommended consolidation path from m3:

```bash
cd ~/mastermind-v2
git checkout -b pre-design-system-merge
git push -u origin pre-design-system-merge
# absorb the new content
rsync -av --delete --exclude='.git' "../Mahana Design System/" ./
git add -A
git commit -m "Absorb full Mahana Design System prototype (ref/, skills/, specs/)"
git push
```

**Why merge not fork:** `mastermind-v2` already has CF Pages/Vercel deploy wiring + GH history. Forking means redoing deploy + losing attribution. The merge keeps one canonical place and auto-deploys the richer content.

After merge, the Staging Bench (`preview/playground/staging-bench.html`) becomes directly accessible from the deployed playground URL — no separate hosting needed.
