Add Discord-native hybrid front-end for Jarvis (bot + bridge)

Transform isair/jarvis into a Discord-controlled voice assistant running on the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact. - bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral), voice channel join + voice receive/playback, pluggable VNC screen broadcast (selfbot live / noVNC / screenshot) - bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS behind a thin localhost HTTP API - .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite, docs/language-comparison.md and docs/vnc-xfce-setup.md Language decision: hybrid (Python brain + Node/bun Discord layer) because Discord blocks bot video; native screen broadcast only works via a Node selfbot library.
2026-06-09 14:51:05 +09:00
parent a5bf8d1826
commit c4abf63f38
308 changed files with 94135 additions and 1 deletions
--- a/docs/UPSTREAM-README.md
+++ b/docs/UPSTREAM-README.md
@@ -0,0 +1,597 @@
+# Jarvis
+
+**A 100% private AI voice assistant that lives on your computer** (works offline). Talk naturally as if Jarvis is a third person in the room — say its name anywhere in your sentence and get conversational, context-aware responses. It remembers everything, always knows the current location and time, can search the web, read your screen, control Chrome, track nutrition, and much more with support for unlimited MCPs and tools without context rot. Sensitive info is automatically redacted before anything is saved to disk.
+
+🔒 100% local processing. No subscriptions. No data harvesting. Automatic redaction of sensitive info. Free offline dictation included.
+
+---
+
+**Support Jarvis** [![GitHub Sponsors](https://img.shields.io/badge/Sponsor-GitHub%20Sponsors-ff69b4?logo=github)](https://github.com/sponsors/isair) [![Ko-fi](https://img.shields.io/badge/Support-Ko--fi-ff5722?logo=kofi&logoColor=white)](https://ko-fi.com/isair)
+
+---
+
+<p align="center">
+  <img src="docs/img/face.png" alt="Jarvis Face" width="400">
+</p>
+
+<p align="center">
+  <img src="docs/img/memory-viewer-diary.png" alt="Memory Viewer - Diary" width="280">
+  <img src="docs/img/memory-viewer-knowledge.png" alt="Memory Viewer - Knowledge Graph" width="280">
+  <img src="docs/img/memory-viewer-meals.png" alt="Memory Viewer - Meals" width="280">
+</p>
+
+## Why Jarvis?
+
+**🔒 Your data stays yours** - 100% local AI processing. No cloud, no subscriptions, no data harvesting. Automatic redaction of sensitive info. This is non-negotiable.
+
+**🗣️ A third person in the room** - Unlike voice assistants that only respond to rigid commands, Jarvis understands conversations. It maintains a short temporary rolling context of what's being discussed, so when you ask "Jarvis, what do you think?" it knows exactly what you're talking about. Have it chime into discussions with friends, help debug code while you talk through problems, or weigh in on decisions.
+
+**🧠 Never forgets** - Unlimited memory across conversations. Adapts tone naturally to the topic. Learns your preferences over time.
+
+**🎙️ Free dictation** - Hold a hotkey, speak, release — your words appear in any app as text. Like WisprFlow, but free, offline, and private. No subscription, no cloud transcription.
+
+**🔌 Extensible** - MCP integration connects Jarvis to thousands of tools: smart home, GitHub, Slack, databases, and more. Smart tool selection means adding more tools won't slow things down.
+
+**📊 Transparent progress** - We track what works (and what doesn't) with automated evals. [See current accuracy →](EVALS.md)
+
+**🚧 Known limitations:** Jarvis is under active development. Primary development happens on macOS. Windows/Linux support may lag behind. We're building in the open, [issues](https://github.com/isair/jarvis/issues) and [contributions](https://github.com/isair/jarvis/pulls) welcome!
+- Voice-only for now—no text chat interface yet ([#35](https://github.com/isair/jarvis/issues/35))
+- No mobile apps ([#17](https://github.com/isair/jarvis/issues/17))
+- "Stop" commands during speech sometimes get filtered as echo ([#24](https://github.com/isair/jarvis/issues/24))
+- Dictation is not available on macOS 26+ (Tahoe) due to a pynput incompatibility ([#172](https://github.com/isair/jarvis/issues/172))
+
+<details>
+<summary><strong>See it in action</strong> (example conversations)</summary>
+
+**Chiming into conversations** (the magic moment):
+```
+👤 Alice: I wonder what the weather will be like tomorrow
+👤 Bob: Yeah, we should check before planning the picnic
+👤 Alice: Jarvis, what do you think?
+  📝 Heard: "What do you think Jarvis?"
+  🧠 Intent (wake word): directed → "what do you think about the weather for the picnic"
+
+✨ Working on it: what do you think about the weather for the picnic
+  🧰 Tool: getWeather…
+  💬 Generating response...
+
+🤖 Jarvis
+Tomorrow looks great for a picnic! Sunny with highs around 22°C...
+```
+Jarvis understood the entire conversation and gave a contextual answer — no need to repeat the question.
+
+**Browser automation via MCP:**
+```
+  📝 Heard: "Open YouTube Jarvis."
+  🧠 Intent (wake word): directed → "open YouTube"
+
+✨ Working on it: open YouTube
+  💬 Generating response...
+
+🤖 Jarvis
+I have opened YouTube for you.
+```
+
+**Health-aware advice:**
+```
+  📝 Heard: "Should I order pizza tonight considering my health goals Jarvis?"
+  🧠 Intent (wake word): directed → "should I order pizza tonight considering my health goals"
+
+✨ Working on it: should I order pizza tonight considering my health goals
+  🧰 Tool: fetchMeals…
+  💬 Generating response...
+
+🤖 Jarvis
+Looking at your week — you mentioned wanting to lose 5kg by June...
+Today you've had about 860 calories so far. Given your 1,800 calorie target,
+you've got room for pizza! But maybe consider thin crust to save calories.
+```
+
+**Personalised news search:**
+```
+  📝 Heard: "What are some news from today that might interest me Jarvis?"
+  🧠 Intent (wake word): directed → "what are some news from today that might interest me"
+
+✨ Working on it: what are some news from today that might interest me
+  🧰 Tool: searchMemory…
+  🧰 Tool: webSearch…
+  💬 Generating response...
+
+🤖 Jarvis
+Here's a quick snapshot of today's headlines...
+```
+
+**Natural wake word placement** (say "Jarvis" anywhere):
+```
+  📝 Heard: "Give me a random topic to discuss Jarvis."
+  🧠 Intent (wake word): directed → "give me a random topic to discuss"
+
+✨ Working on it: give me a random topic to discuss
+  💬 Generating response...
+
+🤖 Jarvis
+How about the Fermi Paradox? Given the vast number of stars...
+```
+
+**Echo detection** (Jarvis ignores its own speech):
+```
+🤖 Jarvis
+I have opened YouTube for you.
+
+👂 Listening for follow-up (3s)...
+  📝 Heard: "I have opened YouTube for you."
+  🔇 Heard (echo): "i have opened youtube for you."
+💤 Returning to wake word mode
+```
+
+</details>
+
+## Quick Install
+
+### 1. Install Prerequisites
+
+| Platform | Requirement |
+|----------|-------------|
+| **All** | [Ollama](https://ollama.com/download) |
+
+### 2. Download Jarvis
+
+Get the latest from [GitHub Releases](https://github.com/isair/jarvis/releases):
+
+| Platform | Download | Run |
+|----------|----------|-----|
+| **Windows** | `Jarvis-Windows-x64.zip` | Extract → Run `Jarvis.exe` |
+| **macOS** | `Jarvis-macOS-arm64.zip` | Extract → Move to Applications → Right-click → Open |
+| **Linux** | `Jarvis-Linux-x64.tar.gz` | `tar -xzf` → Run `./Jarvis/Jarvis` |
+
+Jarvis starts listening automatically — just say "Jarvis" and talk!
+
+<p align="center">
+  <img src="docs/img/setup-wizard-initial-check.png" alt="Setup - Initial Check" width="200">
+  <img src="docs/img/setup-wizard-model.png" alt="Setup - Model Selection" width="200">
+  <img src="docs/img/setup-wizard-whisper.png" alt="Setup - Whisper" width="200">
+  <img src="docs/img/setup-wizard-dictation.png" alt="Setup - Dictation" width="200">
+  <img src="docs/img/setup-wizard-mcp.png" alt="Setup - MCP Servers" width="200">
+  <img src="docs/img/setup-wizard-complete.png" alt="Setup - Complete" width="200">
+</p>
+
+<p align="center">
+  <img src="docs/img/logs.png" alt="Real-time Logs" width="500">
+</p>
+
+## Features
+
+- **Conversational Awareness** - Understands ongoing discussions. Ask "Jarvis, what do you think?" and it knows what you're talking about. Works naturally in multi-person conversations.
+- **Unlimited Memory** - Never forgets. Searches across all your conversation history. Memory Viewer GUI included.
+- **Adaptive Tone** - Automatically surgical for code, pragmatic for business, encouraging for wellbeing — no manual mode switching
+- **Smart Tool Selection** - Embedding-based relevance filtering picks only the tools needed per query — add unlimited MCP tools without performance degradation
+- **Built-in Tools** - Screenshot OCR, web search (DuckDuckGo → Brave → Wikipedia fallback chain with auto-fetch), weather, file access, nutrition tracking, location awareness, plus a tool-discovery escape hatch the agent uses to widen its own toolset mid-reply
+- **Knowledge Graph Memory** - Self-organising memory that learns from conversations, auto-splits by topic, and surfaces relevant knowledge automatically
+- **Natural Voice** - Say "Jarvis" anywhere in your sentence, interrupt with "stop", follow up without repeating the wake word
+- **Dictation Mode** - Free, offline alternative to WisprFlow — hold a hotkey, speak, release to paste text into any app
+- **MCP Integration** - Connect to thousands of external tools (Home Assistant, GitHub, Slack, etc.)
+
+## System Requirements
+
+| Hardware | VRAM | Model |
+|----------|------|-------|
+| Most users | 8GB+ | `gemma4:e2b` (default) |
+| Better quality | 16GB+ | `gemma4:e4b` |
+| High-end | 24GB+ | `gpt-oss:20b` |
+
+> **Note:** VRAM requirements include the intent judge model (`gemma4:e2b`) which is always loaded alongside the chat model for voice intent classification. The default model shares this, so no extra VRAM is needed.
+
+The setup wizard will guide you through model selection and installation on first launch.
+
+## Configuration
+
+Most users won't need to change anything. Open **⚙️ Settings** from the tray menu to configure Jarvis through a graphical interface — no JSON editing required. Settings are saved to `~/.config/jarvis/config.json`.
+
+<p align="center">
+  <img src="docs/img/settings-window.png" alt="Settings Window" width="500">
+  <img src="docs/img/settings-mcp.png" alt="Settings - MCP Servers" width="500">
+</p>
+
+<details>
+<summary><strong>Speech Recognition (Whisper)</strong></summary>
+
+#### Language Modes
+- **Multilingual** (default, 99 languages): `"whisper_model": "medium"`
+- **English Only** (slightly better English accuracy): `"whisper_model": "medium.en"`
+
+#### Model Sizes
+| Model | English | Multilingual | Download | VRAM | Speed |
+|-------|---------|--------------|----------|------|-------|
+| Tiny | `tiny.en` | `tiny` | ~75 MB | ~1 GB | ~10x |
+| Base | `base.en` | `base` | ~140 MB | ~1 GB | ~7x |
+| Small | `small.en` | `small` | ~465 MB | ~2 GB | ~4x |
+| **Medium** | `medium.en` | `medium` | ~1.5 GB | ~5 GB | ~2x |
+| Large V3 Turbo | - | `large-v3-turbo` | ~1.5 GB | ~6 GB | ~8x |
+
+Speed is relative to the original large model. [Source](https://github.com/openai/whisper)
+
+#### GPU Acceleration (Windows)
+If you have an NVIDIA GPU, Jarvis can use CUDA for much faster speech recognition. The Windows installer offers an optional CUDA download during setup. For development:
+```bash
+pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
+```
+CUDA is detected automatically — no configuration needed.
+
+#### Hallucination Filters
+Whisper sometimes produces confident but false transcriptions during silence or background noise (e.g. news-show intros, music). Two thresholds filter these out before they reach the intent judge:
+
+- `"whisper_min_confidence": 0.3` — drops segments whose `avg_logprob`-derived confidence falls below this value. Raise if you see low-confidence noise leaking through; lower if real speech is being dropped.
+- `"whisper_no_speech_threshold": 0.5` — drops any segment whose `no_speech_prob` is at or above this value, regardless of `avg_logprob`. Catches the case where Whisper is confident about a hallucinated phrase but its own no-speech signal says the audio was silent. Applies to both the faster-whisper and MLX backends.
+
+Both thresholds are exposed in the Settings window under *Whisper*.
+
+</details>
+
+<details>
+<summary><strong>Voice Interface (Advanced)</strong></summary>
+
+**LLM Intent Judge** - Jarvis uses `gemma4:e2b` for intelligent voice intent classification (echo detection, query extraction, stop commands). This model is automatically installed alongside your chosen chat model during setup. The intent judge cannot be disabled but gracefully falls back to simpler text matching if Ollama is unavailable.
+
+**Tool Router** - When `"tool_selection_strategy": "llm"` (the default), Jarvis asks a small LLM to pick which tools are relevant for each query, shrinking the tool catalogue the chat model sees. By default this routing call reuses the intent-judge model — it's already warm and small enough not to stall the turn. Override with `"tool_router_model": "<name>"` to dedicate a different model to routing. Other strategies: `"keyword"` (fast, no LLM), `"embedding"` (nomic-embed-text), `"all"` (no filtering).
+
+**Task-list Planner** - Before the agentic loop, Jarvis runs a short planning pass that decomposes multi-step queries into an ordered list of sub-tasks. For small models (`gemma4:e2b` class), each planned step is directly resolved to a concrete tool call without relying on the chat model to re-plan turn-by-turn. This significantly improves multi-step reliability. Config options:
+
+```json
+{
+  "planner_enabled": true,          // set to false to disable the planner entirely
+  "planner_model": "",              // override which model plans (default: reuses tool_router_model chain)
+  "planner_timeout_sec": 6.0        // per-call timeout for plan and step-resolver LLM calls
+}
+```
+
+</details>
+
+<details>
+<summary><strong>Small-Model Digest Passes (Advanced)</strong></summary>
+
+Small chat models (~2B, e.g. `gemma4:e2b`) degrade sharply as their prompt grows. Jarvis runs two cheap distil passes to keep the prompt tight:
+
+- **Memory digest** — boils diary + graph recall into a short relevance-filtered note before injecting it as background context.
+- **Tool-result digest** — boils a raw tool payload (especially webSearch UNTRUSTED WEB EXTRACT blocks) into a short attributed fact note before it reaches the main reply model.
+
+Both digest passes auto-enable for small models (≤7B) and stay off for large models. For small models, tool-result digest also prevents large fetch_web_page payloads from blowing the context window. Override in `~/.config/jarvis/config.json`:
+
+```json
+{
+  "memory_digest_enabled": null,          // null = auto-on for SMALL, false to force off, true to force on
+  "tool_result_digest_enabled": null,     // null = auto-on for SMALL, false to force off, true to force on
+  "llm_digest_timeout_sec": 8.0           // tight ceiling shared by both passes
+}
+```
+
+Field logs show `🧩 Memory digest: …` and `🧩 Tool digest: …` lines when a pass ran, so you can see when the substrate was replaced.
+
+</details>
+
+## Dictation Mode — Free WisprFlow Alternative
+
+Hold a hotkey to record speech, release to paste the transcription into any app. Works everywhere — your editor, browser, chat, terminal. Completely local, completely free.
+
+<p align="center">
+  <img src="docs/img/dictation-history.png" alt="Dictation History" width="400">
+  <img src="docs/img/setup-wizard-dictation.png" alt="Setup Wizard - Dictation" width="400">
+</p>
+
+| Platform | Default hotkey |
+|----------|---------------|
+| **Windows** | Ctrl + Win |
+| **macOS** | Ctrl + Option |
+| **Linux** | Ctrl + Alt |
+
+- 🔒 **100% offline** — your speech never leaves your machine (unlike cloud dictation services)
+- 🧠 **Shared Whisper model** — uses the same speech recognition as voice input, no extra memory
+- ⚡ **Zero latency startup** — no server round-trip, transcription starts the moment you release
+- 📋 **Universal paste** — works in any app that accepts `Ctrl+V` / `Cmd+V`
+- 🔇 **Non-intrusive** — main voice listener pauses automatically during dictation
+- ✋ **Hands-free mode** — double-tap the hotkey to keep recording without holding; press again or hit Escape to stop
+- 🧹 **Filler word removal** — optional LLM-powered cleanup removes "um", "uh", "like", "you know" while preserving meaning
+- 📖 **Custom dictionary** — define `"wrong -> right"` replacements for jargon, names, and technical terms
+- 📜 **History window** — browse, copy, or delete past dictations from the system tray
+- 🎛️ **Easy setup** — configure dictation during the setup wizard or anytime in Settings (hotkey dropdown, filler removal toggle, custom dictionary editor)
+
+Customise the hotkey in Settings or `config.json`:
+```json
+{
+  "dictation_hotkey": "ctrl+alt",
+  "dictation_filler_removal": true,
+  "dictation_custom_dictionary": [
+    "jarvis -> Jarvis",
+    "pytorch -> PyTorch"
+  ]
+}
+```
+
+> **Note:** macOS requires Accessibility permissions for the global hotkey. Linux requires X11 (limited Wayland support).
+
+<details>
+<summary><strong>Text-to-Speech</strong></summary>
+
+**Piper TTS (default)** - Neural TTS that auto-downloads on first use (~60MB):
+- Works out of the box - no setup required
+- High-quality British English male voice (en_GB-alan-medium)
+- Fast local synthesis with exact duration tracking
+
+To use different Piper voices, download from [HuggingFace](https://huggingface.co/rhasspy/piper-voices) and set:
+```json
+{
+  "tts_piper_model_path": "~/.local/share/jarvis/models/piper/en_GB-alan-medium.onnx"
+}
+```
+
+**Chatterbox** - AI voice with emotion control (requires running from source):
+```json
+{ "tts_engine": "chatterbox" }
+```
+
+Voice cloning with Chatterbox - add a 3-10 second .wav sample:
+```json
+{
+  "tts_engine": "chatterbox",
+  "tts_chatterbox_audio_prompt": "/path/to/voice.wav"
+}
+```
+
+</details>
+
+<details>
+<summary><strong>Location Detection</strong></summary>
+
+Jarvis can provide location-aware responses (weather, local time, etc.) using a local GeoLite2 database — no cloud geolocation services are used.
+
+**IP detection chain** (in order of preference):
+1. **Manual IP** — configure `location_ip_address` in settings
+2. **UPnP** — queries your local router (no traffic leaves LAN)
+3. **Socket heuristic** — determines which interface routes externally (no data sent)
+4. **OpenDNS DNS query** — single `myip.opendns.com` lookup to `208.67.222.222` (only external query)
+
+If your ISP uses carrier-grade NAT (CGNAT), Jarvis automatically resolves your true public IP via the same OpenDNS DNS query. This can be disabled:
+
+```json
+{
+  "location_cgnat_resolve_public_ip": false
+}
+```
+
+**Setup:** Register for a free [MaxMind GeoLite2](https://www.maxmind.com/en/geolite2/signup) account, download the City database (MMDB format), and save it to `~/.local/share/jarvis/geoip/GeoLite2-City.mmdb`. The setup wizard will guide you through this.
+
+</details>
+
+<details>
+<summary><strong>MCP Tool Integration</strong></summary>
+
+Connect Jarvis to external tools via [MCP servers](https://github.com/topics/mcp-server):
+
+```json
+{
+  "mcps": {
+    "github": {
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-github"],
+      "env": { "GITHUB_TOKEN": "your-token" }
+    }
+  }
+}
+```
+
+**Popular integrations:**
+- **Home Assistant** - Voice control for smart home
+- **Google Workspace** - Gmail, Calendar, Drive, Docs
+- **GitHub** - Issues, PRs, workflows
+- **Notion** - Knowledge management
+- **Slack/Discord** - Team communication
+- **Databases** - MySQL, PostgreSQL, MongoDB
+- **Composio** - 500+ apps in one integration
+
+See [full MCP setup guide](#mcp-integrations) below.
+
+</details>
+
+## MCP Integrations
+
+> **Session persistence:** each MCP server is launched once and its stdio session is kept open across tool calls. Stateful servers (e.g. browser automation, where the server owns a long-running Chrome process) work correctly. If you have a server you'd rather not keep resident, set `"idle_timeout_sec": 300` on its config entry and Jarvis will free it after that long without activity.
+
+<details>
+<summary><strong>Home Assistant</strong> - Smart home voice control</summary>
+
+1. Add MCP Server integration in Home Assistant (Settings → Devices & services)
+2. Expose entities you want to control (Settings → Voice assistants → Exposed entities)
+3. Create Long-lived Access Token (Profile → Security → Create token)
+4. Install proxy: `uv tool install git+https://github.com/sparfenyuk/mcp-proxy`
+5. Add to config:
+```json
+{
+  "mcps": {
+    "home_assistant": {
+      "command": "mcp-proxy",
+      "args": ["http://localhost:8123/mcp_server/sse"],
+      "env": { "API_ACCESS_TOKEN": "YOUR_TOKEN" }
+    }
+  }
+}
+```
+
+"Jarvis, turn on the living room lights" / "set bedroom to 72°" / "run good night scene"
+
+</details>
+
+<details>
+<summary><strong>Google Workspace</strong> - Gmail, Calendar, Drive, Docs, Sheets</summary>
+
+```json
+{
+  "mcps": {
+    "google_workspace": {
+      "command": "npx",
+      "args": ["-y", "google-workspace-mcp"],
+      "env": {
+        "GOOGLE_CLIENT_ID": "your-client-id",
+        "GOOGLE_CLIENT_SECRET": "your-client-secret"
+      }
+    }
+  }
+}
+```
+Setup: [taylorwilsdon/google_workspace_mcp](https://github.com/taylorwilsdon/google_workspace_mcp)
+
+</details>
+
+<details>
+<summary><strong>GitHub</strong> - Repos, issues, PRs, workflows</summary>
+
+```json
+{
+  "mcps": {
+    "github": {
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-github"],
+      "env": { "GITHUB_TOKEN": "your-token" }
+    }
+  }
+}
+```
+
+</details>
+
+<details>
+<summary><strong>Notion, Slack, Discord, Databases</strong></summary>
+
+**Notion:**
+```json
+{ "mcps": { "notion": { "command": "npx", "args": ["-y", "@makenotion/mcp-server-notion"], "env": { "NOTION_API_KEY": "your-token" } } } }
+```
+
+**Slack:**
+```json
+{ "mcps": { "slack": { "command": "npx", "args": ["-y", "slack-mcp-server"], "env": { "SLACK_BOT_TOKEN": "xoxb-...", "SLACK_USER_TOKEN": "xoxp-..." } } } }
+```
+
+**Discord:**
+```json
+{ "mcps": { "discord": { "command": "npx", "args": ["-y", "discord-mcp-server"], "env": { "DISCORD_BOT_TOKEN": "your-token" } } } }
+```
+
+**Databases:** [bytebase/dbhub](https://github.com/bytebase/dbhub) (SQL), [mongodb-mcp-server](https://github.com/mongodb-js/mongodb-mcp-server) (MongoDB)
+
+</details>
+
+<details>
+<summary><strong>Composio</strong> - 500+ apps in one integration</summary>
+
+```json
+{
+  "mcps": {
+    "composio": {
+      "command": "npx",
+      "args": ["-y", "@composiohq/rube"],
+      "env": { "COMPOSIO_API_KEY": "your-key" }
+    }
+  }
+}
+```
+Get API key at [composio.dev](https://composio.dev)
+
+</details>
+
+## Troubleshooting
+
+<details>
+<summary><strong>Common issues</strong></summary>
+
+**First startup takes a bit** - Jarvis pre-warms the Whisper, chat, and intent-judge models before announcing "Listening!" so the first engagement feels instant. This adds a few seconds on cold start and is bounded at 60 s — if Ollama is slow, Jarvis will start listening anyway and load the models on demand.
+
+**Jarvis doesn't hear me** - Check microphone permissions, speak clearly after "Jarvis"
+
+**Responses are slow** - Ensure you have enough VRAM (8GB+ for default model; see System Requirements for other models)
+
+**Windows: App won't start** - Extract full zip first, check Windows Defender
+
+**macOS: "App can't be opened"** - Right-click → Open, or System Settings → Privacy & Security → Allow
+
+**Linux: No tray icon** - `sudo apt install libayatana-appindicator3-1`
+
+**Jarvis keeps deflecting on questions it answered before** - small models can record their own past failures into the diary, which then primes future sessions to repeat them. New writes are scrubbed automatically; to clean historical entries, open the Memory Viewer, switch to the Diary tab, and click **Clean up deflection narration** in the sidebar Maintenance section. Only sentences that narrate the assistant's failures are removed; the rest of each entry stays.
+
+</details>
+
+## For Developers
+
+<details>
+<summary><strong>Running from source</strong></summary>
+
+```bash
+git clone https://github.com/isair/jarvis.git
+cd jarvis
+
+# macOS
+bash scripts/run_macos.sh
+
+# Windows (with Micromamba)
+pwsh -ExecutionPolicy Bypass -File scripts\run_windows.ps1
+
+# Linux
+bash scripts/run_linux.sh
+```
+
+Running from source enables Chatterbox TTS (AI voice with emotion/cloning). Piper TTS works in both bundled and source modes.
+
+</details>
+
+<details>
+<summary><strong>Privacy hardening</strong> (stay 100% offline)</summary>
+
+```json
+{
+  "web_search_enabled": false,
+  "wikipedia_fallback_enabled": false,
+  "brave_search_api_key": "",
+  "mcps": {},
+  "location_auto_detect": false,
+  "location_cgnat_resolve_public_ip": false,
+  "location_enabled": false
+}
+```
+
+Verify: `sudo lsof -i -n -P | grep jarvis` (should only show 127.0.0.1 to Ollama)
+
+</details>
+
+<details>
+<summary><strong>Web search fallback chain</strong></summary>
+
+When DuckDuckGo is rate-limited or returns nothing fetchable, Jarvis walks
+a small fallback chain before giving up rather than confabulating:
+
+1. **Brave Search** — opt-in, requires `brave_search_api_key`. Free tier:
+   2,000 queries/month. Get a key at
+   [api.search.brave.com](https://api.search.brave.com/app/keys).
+2. **Wikipedia** — zero-config, on by default, uses the Wikipedia host
+   matching the language Whisper auto-detected on the utterance (so a
+   Turkish question gets a Turkish answer). Disable with
+   `wikipedia_fallback_enabled: false`.
+3. **Honest failure** — if every provider fails, the reply tells you the
+   search was blocked rather than making something up.
+
+The whole chain is bounded by a ~20s wall-clock deadline so a stalled
+provider can't run out the voice-assistant latency budget.
+
+</details>
+
+## Privacy & Storage
+
+- **100% offline** - No cloud services required
+- **Auto-redaction** - Emails, tokens, passwords automatically removed
+- **Local storage** - Everything in `~/.local/share/jarvis`
+
+## License
+
+- **Personal use**: Free forever
+- **Commercial use**: [Contact us](mailto:baris@writeme.com)
+
+## Support
+
+[Report issues](https://github.com/isair/jarvis/issues) · [Discussions](https://github.com/isair/jarvis/discussions) · [Sponsor](https://github.com/sponsors/isair)
--- a/docs/img/dictation-history.png
+++ b/docs/img/dictation-history.png
--- a/docs/img/face.png
+++ b/docs/img/face.png
--- a/docs/img/logs.png
+++ b/docs/img/logs.png
--- a/docs/img/memory-viewer-diary.png
+++ b/docs/img/memory-viewer-diary.png
--- a/docs/img/memory-viewer-knowledge.png
+++ b/docs/img/memory-viewer-knowledge.png
--- a/docs/img/memory-viewer-meals.png
+++ b/docs/img/memory-viewer-meals.png
--- a/docs/img/settings-mcp.png
+++ b/docs/img/settings-mcp.png
--- a/docs/img/settings-window.png
+++ b/docs/img/settings-window.png
--- a/docs/img/setup-wizard-complete.png
+++ b/docs/img/setup-wizard-complete.png
--- a/docs/img/setup-wizard-dictation.png
+++ b/docs/img/setup-wizard-dictation.png
--- a/docs/img/setup-wizard-initial-check.png
+++ b/docs/img/setup-wizard-initial-check.png
--- a/docs/img/setup-wizard-mcp.png
+++ b/docs/img/setup-wizard-mcp.png
--- a/docs/img/setup-wizard-model.png
+++ b/docs/img/setup-wizard-model.png
--- a/docs/img/setup-wizard-whisper.png
+++ b/docs/img/setup-wizard-whisper.png
--- a/docs/language-comparison.md
+++ b/docs/language-comparison.md
@@ -0,0 +1,46 @@
+# 언어 선택: Python 유지 vs 재작성 — 장단점 비교
+
+요구사항을 만족시키기 위해 "언어를 바꿀지"를 먼저 따졌습니다. 결론은 **하이브리드(Python 두뇌 유지 + Node/bun Discord 레이어 신규)** 입니다. 근거를 정리합니다.
+
+## 결정을 좌우한 핵심 사실
+
+1. **디스코드 봇은 영상(Go Live)을 송출할 수 없다.** Discord가 봇 계정의 영상 전송을 정책적으로 막아둠 (2026년 현재도 동일, 공식 API 변화 없음).
+2. **봇 영상 송출이 되는 라이브러리는 Node 전용이고 셀프봇(유저 토큰)을 요구한다.** `@dank074/discord-video-stream`(v6, 2026-03 기준 유지보수 중) + `discord.js-selfbot-v13`. Python에는 동등한 동작 라이브러리가 없음.
+3. **기존 jarvis 두뇌는 Python 약 39,000줄**(메모리 그래프·벡터스토어·planner/evaluator 답변엔진·MCP 툴·redaction·STT(faster-whisper)·TTS(piper)). 검증된 자산.
+4. 음성 입출력/슬래시 명령/ephemeral/음성채널 접속은 Python(py-cord)·Node(discord.js) 모두 가능하지만, **Node 생태계가 더 성숙**.
+
+## 옵션별 비교
+
+| 항목 | A. Python 단일 유지 | B. 전면 Node/bun 재작성 | C. 하이브리드 (채택) |
+|---|---|---|---|
+| VNC 영상 송출(native) | ❌ 사실상 불가 | ✅ 가능 | ✅ 가능(Node 레이어) |
+| 음성 입출력 | ✅ | ✅ | ✅ |
+| 슬래시/ephemeral | ✅ | ✅(더 성숙) | ✅ |
+| 기존 두뇌 재사용 | ✅ 그대로 | ❌ 39k줄 재작성 | ✅ 그대로 |
+| 작업량/리스크 | 중(영상 막힘) | 매우 큼/높음 | 작음/낮음 |
+| 유지보수 | 단일 언어 | 단일 언어 | 2개 런타임(경계 단순) |
+
+- **A 탈락**: 핵심 요구(디스코드 화면 방송)를 만족 못 함.
+- **B 탈락**: 성숙한 두뇌를 버리고 수 주간 재작성. 회귀·버그 위험 큼. 이득(언어 통일)이 비용보다 작음.
+- **C 채택**: 영상이 가능한 Node로 "디스코드/음성/영상 인터페이스"만 새로 짜고, 두뇌는 Python 그대로 둔 뒤 얇은 HTTP 브릿지로 연결.
+
+## 하이브리드 경계 설계
+
+```
+Discord  ──voice/video/slash──▶  bot/ (Node + bun, discord.js)
+                                   │  HTTP (localhost)
+                                   ▼
+                               bridge/ (Python, Flask)
+                                   │  in-process import
+                                   ▼
+                               src/jarvis (기존 두뇌)
+```
+
+- 경계는 단 하나(HTTP localhost). 직렬화는 WAV(오디오) + JSON(텍스트)뿐이라 단순.
+- Node는 AI 로직을 일절 갖지 않음 → 두 런타임의 책임이 깨끗하게 분리.
+
+## Node 채택부의 bun 적극 활용
+
+- 패키지 매니저/런타임 모두 **bun** 사용 (`bun install`, `bun run`).
+- TypeScript를 트랜스파일 없이 직접 실행(`bun run src/index.ts`).
+- 네이티브 의존(`@discordjs/opus`, video-stream의 node-av/node-datachannel)은 bun에서 install 스크립트 허용 필요 → 본 레포는 무거운 네이티브 의존을 `optionalDependencies`로 분리해 기본 설치를 가볍게 유지.
--- a/docs/llm_contexts.md
+++ b/docs/llm_contexts.md
@@ -0,0 +1,266 @@
+# LLM Contexts Map
+
+Every distinct LLM call in Jarvis, what feeds it, what consumes it, and how it is gated. This is the reference for optimising the app's main bottleneck (LLM latency). Keep it in sync with the code — see the note at the bottom.
+
+---
+
+## 1. Main Reply Loop (agentic messages loop)
+
+- **File**: [src/jarvis/reply/engine.py](src/jarvis/reply/engine.py) — `reply()` and the loop at ~lines 1370-1650; native tool-call path in `chat_with_messages()` (~1424, 1455).
+- **Trigger**: every user message. Runs up to `agentic_max_turns` (default 8) iterations per reply.
+- **Model / gating**: `cfg.ollama_chat_model` (the big model). Not optional. No size branching on the loop itself — size branching affects the digests/evaluator around it.
+- **Inputs**:
+  - Redacted user query
+  - Recent dialogue (last 5 minutes), including in-loop tool-call + tool-role messages from prior replies within the active conversation (tool carryover, `DialogueMemory.record_tool_turn` / `get_recent_turns_with_tools` in [src/jarvis/memory/conversation.py](src/jarvis/memory/conversation.py); per-prompt cap via `cfg.tool_carryover_max_turns` / `tool_carryover_per_entry_chars`; storage cap `_tool_turns_max_storage = 16`; cleared on `stop` signal AND on new-conversation entry; UNTRUSTED WEB EXTRACT fence markers preserved on truncation; both `content` and `tool_calls[*].function.arguments` scrubbed on write)
+  - Unified system prompt from [src/jarvis/system_prompt.py](src/jarvis/system_prompt.py) + ASR note + tool-protocol guidance
+  - **Warm profile block** (query-agnostic User + Directives excerpt from the knowledge graph, composed by `build_warm_profile()` / `format_warm_profile_block()` in [src/jarvis/memory/graph_ops.py](src/jarvis/memory/graph_ops.py) at Step 3.5 of `reply()`; no LLM call, pure SQLite read; injected unconditionally so personalisation is the default; result cached in `DialogueMemory._hot_cache` under `DialogueMemory.WARM_PROFILE_CACHE_KEY` for the lifetime of the active conversation. Invalidated on `stop`, on new-conversation entry, AND on User/Directives graph mutations via the listener registered in [src/jarvis/daemon.py](src/jarvis/daemon.py) against `register_graph_mutation_listener` in [src/jarvis/memory/graph.py](src/jarvis/memory/graph.py); World-branch writes are ignored)
+  - Digested memory enrichment (optional, see #4)
+  - Time + location context (re-injected each turn)
+  - Tool schema: native via `generate_tools_json_schema()` ([src/jarvis/tools/registry.py](src/jarvis/tools/registry.py)) or text fallback via `_text_tool_call_guidance()` ([engine.py:68](src/jarvis/reply/engine.py:68))
+  - Tool results from prior turns (raw or digested — see #5)
+- **Output**: OpenAI-style `{content, tool_calls, thinking}`. Consumed by the tool orchestrator and TTS pipeline. Natural-language content is delivered immediately; no post-turn evaluator runs.
+- **Limits**: `num_ctx: 8192` (explicit). Timeout `llm_chat_timeout_sec` (45s). Auto-fallback from native to text tool-calls on HTTP 400 (`ToolsNotSupportedError`), sticky for the session. Risk: `fetch_web_page` truncates at 50,000 chars (~37k tokens) — mitigated for SMALL models by tool-result digest (#5) which compresses the payload before it enters the messages history. LARGE models receive the raw payload and may silently see a truncated context.
+
+## 2. Intent Judge
+
+- **File**: [src/jarvis/listening/intent_judge.py](src/jarvis/listening/intent_judge.py) — `IntentJudge.evaluate()`.
+- **Trigger**: on a speech segment *only if* there is an engagement signal (wake word detected, hot-window active, or TTS playing). Pure ambient speech skips it.
+- **Model / gating**: `cfg.intent_judge_model` (default `gemma4:e2b`, ~2B). Falls back to text-based wake detection if Ollama is unavailable.
+- **Inputs**:
+  - Rolling transcript buffer (last 120s, with timestamps)
+  - Wake-word timestamp (if any), normalised aliases
+  - Last TTS text + finish time (echo rejection)
+  - State flags (wake_word_mode, hot_window_mode, during_tts)
+- **System prompt**: `SYSTEM_PROMPT_TEMPLATE` at [intent_judge.py:135](src/jarvis/listening/intent_judge.py:135). Teaches query extraction, echo detection, stop commands, pronoun/topic disambiguation, imperative re-addressing, declaratives to the wake word.
+- **Output**: strict JSON `IntentJudgment{directed, query, stop, confidence, reasoning}` ([intent_judge.py:94](src/jarvis/listening/intent_judge.py:94)). Consumed by the listening state machine which dispatches to the reply engine.
+- **Limits**: `intent_judge_timeout_sec` (15s). `num_ctx: 8192` (explicit — system prompt is ~2k tokens after PR #362, and the rolling transcript buffer at default `transcript_buffer_duration_sec=120` can reach ~1.5k tokens in chatty multi-speaker scenes; 4096 left ~10% headroom and risked silent ollama truncation of the system prompt's tail, where the few-shot examples and TRANSCRIPT NOISE block live).
+
+## 3. Memory Enrichment Extractor
+
+- **File**: [src/jarvis/reply/enrichment.py](src/jarvis/reply/enrichment.py) — `extract_search_params_for_memory()` (~line 71).
+- **Trigger**: once per reply, **only when the pre-flight planner (#12) emitted a `searchMemory` directive or returned an empty plan (fail-open)**. Pure reply-only plans skip this entirely — saves one LLM call per greeting / small-talk turn.
+- **Model / gating**: resolved via `resolve_tool_router_model(cfg)` — `tool_router_model → intent_judge_model → ollama_chat_model`. Small classification task; rides the same small/warm model as the router. Silent empty-dict on failure.
+- **Inputs**: user query (with the planner's `topic` hint appended when present), optional context hint (live-context compact summary), UTC now.
+- **System prompt**: inline at [enrichment.py:35-63](src/jarvis/reply/enrichment.py:35).
+- **Output**: `{keywords, from?, to?, questions?}`. Consumed by memory search in the reply engine.
+- **Limits**: up to 2 retries; timeout from `llm_tools_timeout_sec`.
+- **Caching**: result cached in `DialogueMemory._hot_cache` under key `enrichment:{redacted_query[+topic_hint]}` for the lifetime of the active conversation. Identical follow-ups within the same conversation reuse the dict and skip the LLM hop. Cleared by `clear_hot_cache()` on the `stop` signal and on new-conversation entry.
+
+## 3b. Recall Gate (pre-enrichment short-circuit)
+
+- **File**: [src/jarvis/memory/recall_gate.py](src/jarvis/memory/recall_gate.py) — `should_recall()`.
+- **Trigger**: once per reply, before diary/graph/digest enrichment runs (after the planner has decided memory is potentially needed).
+- **Model / gating**: NO LLM — deterministic keyword-coverage heuristic. Cheap.
+- **Inputs**: query, recent dialogue (incl. tool carryover rows).
+- **Output**: `False` only if hot-window contains a fresh tool result AND ≥50% of the query's content words appear in the hot-window transcript → skips diary, graph, and memory digest for this reply. Else `True`. Fail-open on any exception. Content-word extraction uses `\w{3,}` with `re.UNICODE`, so the gate works for Latin, Cyrillic, CJK, Arabic, Hebrew, etc. (per CLAUDE.md "no hardcoded language patterns"). Overlap words are run through `redact()` before being written to debug logs.
+- **Planner precedence**: when the planner explicitly emitted a `searchMemory` step, the gate is bypassed — the planner has more signal than coverage and overriding it would silently drop intent. The gate only short-circuits the fail-open empty-plan path.
+- **Rationale**: prevents re-running diary/graph lookups when the hot window already grounds the follow-up (e.g. "his most famous song" after a Bieber webSearch).
+
+## 4. Memory Digest (optional, SMALL models)
+
+- **File**: [src/jarvis/reply/enrichment.py](src/jarvis/reply/enrichment.py) — `digest_memory_for_query()` + `_distil_batch()`.
+- **Trigger**: once per reply when enrichment returns hits AND `memory_digest_enabled` (default OFF; `null` = auto-ON for SMALL ≤7B / OFF for LARGE). Skipped if raw < `_DIGEST_MIN_CHARS` (400). Batched if raw > `_DIGEST_BATCH_MAX_CHARS` (2000).
+- **Model / gating**: `ollama_chat_model`. Gated by `memory_digest_enabled`.
+- **Inputs**: user query, raw diary entries, raw graph nodes.
+- **System prompt**: `_DIGEST_SYSTEM_PROMPT` at [enrichment.py:122](src/jarvis/reply/enrichment.py:122). Teaches relevance filtering, preference-signal detection, attribution preservation, `NONE` sentinel, identity queries.
+- **Output**: ≤400 chars text per batch (`_DIGEST_MAX_CHARS`) injected as reference-only memory context into the main loop's system message. Empty on failure.
+- **Limits**: `llm_digest_timeout_sec` (8s, shared).
+
+## 5. Tool-Result Digest (optional, opt-in)
+
+- **File**: [src/jarvis/reply/enrichment.py](src/jarvis/reply/enrichment.py) — `digest_tool_result_for_query()` + `_distil_tool_batch()`.
+- **Trigger**: after each tool result in the loop, if `tool_result_digest_enabled` (default `null` = auto-ON for SMALL ≤7B, OFF for LARGE). Primary motivation on small models: prevents `fetch_web_page`'s 50k-char payloads from filling the 8192 num_ctx window. Skipped if raw < 400 chars (`_TOOL_DIGEST_MIN_CHARS`); batched if > 2500 (`_TOOL_DIGEST_BATCH_MAX_CHARS`).
+- **Model / gating**: `ollama_chat_model`. Gated by `tool_result_digest_enabled`.
+- **Inputs**: user query, tool name, raw tool result (e.g. webSearch payload inside UNTRUSTED WEB EXTRACT fence).
+- **System prompt**: `_TOOL_DIGEST_SYSTEM_PROMPT`. Teaches attributed fact extraction, `NONE` sentinel, no inference.
+- **Output**: ≤600 chars per batch (`_TOOL_DIGEST_MAX_CHARS`) replacing the raw payload in the messages stream. Falls back to raw on `NONE`.
+- **Limits**: `llm_digest_timeout_sec` (8s, shared).
+
+## 6. Max-Turn Loop Digest
+
+- **File**: [src/jarvis/reply/enrichment.py](src/jarvis/reply/enrichment.py) — `digest_loop_for_max_turns()` (~line 847).
+- **Trigger**: when the loop exhausts `agentic_max_turns` without producing a natural-language reply (e.g. pure tool-call loop). The evaluator no longer drives this — termination on content is immediate.
+- **Model / gating**: `_resolve_loop_digest_model(cfg)` — prefers `intent_judge_model`, falls back to `ollama_chat_model`.
+- **Inputs**: user query + loop activity (tool calls, results summaries, any prose).
+- **System prompt**: `_LOOP_DIGEST_SYSTEM_PROMPT` — caveat-prefixed, user-language, concise.
+- **Output**: caveat-prefixed final reply. Fails open to the last raw candidate or generic error.
+- **Limits**: `llm_digest_timeout_sec` (8s, shared).
+
+## 7. Tool Router (pre-loop tool selection)
+
+- **File**: [src/jarvis/tools/selection.py](src/jarvis/tools/selection.py) — `select_tools_with_llm()` (~line 331).
+- **Trigger**: once per reply, **at the very front of the flow before the planner (#12)**. Always runs — the router is the authoritative tool picker, and its narrowed catalogue is what the planner sees. When the planner later references tools, those names are unioned into the router's allow-list but never replace it; small models tend to default to `webSearch` where a dedicated tool like `getWeather` should win, and the router is tuned for that classification. `tool_selection_strategy == "llm"` is the default; other strategies (`all`, `keyword`, `embedding`) also run here.
+- **Model / gating**: `resolve_tool_router_model(cfg)` chain — `tool_router_model → intent_judge_model → ollama_chat_model`.
+- **Inputs**: user query, tool catalogue (builtin + MCP with descriptions), optional narrow-down hint.
+- **System prompt**: inline (~lines 260-315). Teaches pick up-to-5 tools or `none`.
+- **Output**: comma-separated tool names or `none`. Capped at `_LLM_MAX_SELECTED` (5). Always-included tools (`stop`, `toolSearchTool`) are unioned in regardless.
+- **Limits**: `llm_timeout_sec`. On failure → all tools.
+- **Caching**: `routed_tools` cached in `DialogueMemory._hot_cache` under key `router:{redacted_query}|{strategy}|{builtin-names}|{mcp-names}` for the lifetime of the active conversation. The catalogue signature lets a mid-conversation MCP refresh invalidate the cache; `context_hint` is intentionally excluded so time/location drift inside one conversation doesn't bust it. Cleared by `clear_hot_cache()` on the `stop` signal and on new-conversation entry.
+- **Carry-over guard (engine-side overlay)**: after the cache lookup/write, the engine inspects the previous assistant turn's tool calls. When a previous tool reported `success=False` on its `ToolExecutionResult` (read via the `tool_failed` flag stamped onto each recorded tool result), that tool name is unioned back into the local `routed_tools` for this turn only. Compensates for small routers that misroute follow-ups where the user is supplying missing info (e.g. "I'm in London" routing to `webSearch` after a stalled `getWeather` chain). Successful chains do not carry over — a genuine new short ask after a completed chain keeps the router pick clean. The augmentation never touches the cache; replays of the same query in future turns get the raw router output. See `src/jarvis/reply/reply.spec.md` §6 (Tool allow-list per turn) for the full contract.
+
+## 8. Tool Searcher (mid-loop escape hatch)
+
+- **File**: [src/jarvis/tools/builtin/tool_search.py](src/jarvis/tools/builtin/tool_search.py) — `toolSearchTool`.
+- **Trigger**: when the model explicitly invokes `toolSearchTool` during the loop. Capped at `tool_search_max_calls` (3) per reply.
+- **Model**: reuses the tool router (#7) — no separate LLM call here.
+- **Inputs**: self-contained query from the model.
+- **Output**: newline-separated tool names + one-liners, merged into the allow-list for the next turn.
+
+## 9. Conversation Summariser
+
+- **File**: [src/jarvis/memory/conversation.py](src/jarvis/memory/conversation.py) — `generate_conversation_summary()` (~lines 350/355).
+- **Trigger**: background, periodic — when unsaved dialogue reaches `dialogue_memory_timeout`. One per day per `source_app`.
+- **Model / gating**: `ollama_chat_model`. Respects `llm_thinking_enabled`. Uses streaming when a token callback is provided, else direct.
+- **Inputs**: recent conversation chunks + prior same-day summary (for incremental update).
+- **System prompt**: inline (~lines 310-320). Hygiene rules per [src/jarvis/memory/summariser.spec.md](src/jarvis/memory/summariser.spec.md): no deflection narration, attribution preservation, topic separation. The deflection rule (rule 6) is enumerated with concrete BAD/GOOD pairs in English plus parallel pairs in Turkish and Spanish so small models don't assume the rule is keyed to English phrasing. ≤200 words + 3-5 topic keywords.
+- **Output**: `(summary_text, topics_text)` → `conversation_summaries` table, embedded for vector search, feeds enrichment (#3) and graph extraction (#10). No post-process scrub — the prompt is single-source-of-truth, language-agnostic, and improves automatically as the chat model upgrades.
+- **Deflection rewrite (separate bulk op)**: `rewrite_all_diary_summaries()` (`POST /api/diary/scrub-deflections`) — for cleaning historical rows written before the prompt was tightened. One `ollama_chat_model` call per row with `_REWRITE_DEFLECTION_SYSTEM_PROMPT`, asking the model to drop sentences that narrate the assistant's own failures while keeping everything else verbatim. Diary text is fenced as untrusted data (same fence used by the web tool). Preserves `ts_utc`; re-embeds updated rows best-effort. Empty-rewrite guard keeps the original if the model would have emptied the row. Fail-open at every layer (LLM call, write-back, embed). User-triggered from the Maintenance section in the diary sidebar.
+- **Topic optimisation (separate bulk op)**: `optimise_diary_topics()` (`POST /api/diary/optimise-topics`) — collects all unique tags from `conversation_summaries`, makes one `ollama_chat_model` call with `_TOPIC_OPTIMISE_SYSTEM_PROMPT` to propose a normalised taxonomy (merge synonyms, split compound tags), then applies the mapping to every row that needs updating. Preserves `ts_utc`; re-embeds updated rows best-effort. User-triggered from the Maintenance section in the diary sidebar.
+- **Limits**: `timeout_sec` (30s default).
+
+## 10. Knowledge Graph Fact Extraction + Branch Classification
+
+- **File**: [src/jarvis/memory/graph_ops.py](src/jarvis/memory/graph_ops.py) — `extract_graph_memories()`.
+- **Trigger**: after each daily summary (#9). Background.
+- **Model**: `ollama_chat_model`.
+- **Inputs**: summary text + optional date.
+- **System prompt**: inline — asks for JSON array of `{"branch": "USER|DIRECTIVES|WORLD", "fact": "..."}` objects, with a heuristic ("user telling the assistant how to behave → DIRECTIVES; user telling the assistant about themselves → USER; external facts → WORLD"). Unknown branches default to USER. The DO-NOT-EXTRACT block hardens two recurring traps: assistant-generated recommendations (would-a-different-assistant-give-the-same-answer? heuristic separates these from external lookups, which DO count as facts) and transient snapshots like the current weather / time of day (described as "moments not facts" so the model stops conflating ephemera with persistent climate / location knowledge).
+- **Output**: list of `(branch_id, fact_text)` tuples → routed into the tagged branch via branch-pinned descent (no cross-branch contamination).
+- **Limits**: `timeout_sec`. Failures → empty list.
+
+## 11. Knowledge Graph Best-Child Picker
+
+- **File**: [src/jarvis/memory/graph_ops.py](src/jarvis/memory/graph_ops.py) — `_llm_pick_best_child()` (~line 167).
+- **Trigger**: during graph insertion, per fact, to place it under the best existing category. Background.
+- **Model**: uses `picker_model` when passed through from `update_graph_from_dialogue` (daemon resolves it via `resolve_tool_router_model(cfg)` → small model when available). Falls back to `ollama_chat_model` when no small model is configured.
+- **Inputs**: fact text + numbered list of candidate child nodes (name + description).
+- **System prompt**: inline (~lines 156-161) — answer with number or `NONE`.
+- **Output**: child node id or `None` (fact still inserted, just not under an optimal parent).
+
+## 11b. Knowledge Graph Node Merge (rewrite-on-write consolidation)
+
+- **File**: [src/jarvis/memory/graph_ops.py](src/jarvis/memory/graph_ops.py) — `merge_node_data()` (system prompt at `_MERGE_SYSTEM_PROMPT`).
+- **Trigger**: **once per (node, flush)** during `update_graph_from_dialogue`. The orchestrator first applies the exact-match dedupe fast-path, then groups the remaining facts by their resolved `node_id` so a 5-fact flush hitting the User node fires one rewrite, not five. Cold-start writes (empty target node) skip straight to plain append. Also invoked with `new_facts=[]` by the `consolidate_all_populated_nodes` maintenance op (powering the memory viewer's 🧹 button) to re-apply current rules to historical data.
+- **Model**: same `picker_model` chain as #11 (small router model when configured, falls back to `ollama_chat_model`). Temperature 0 — the task is rule-following classification.
+- **Inputs**: existing node `data` + the batch of new facts (zero or more) routed to that node in this flush.
+- **System prompt**: defines an ordered rule set — contradiction/reversal drops the old version, near-duplicate phrasings collapse to one, repeated daily activities consolidate into patterns, independent attributes coexist (visible contradictions are NOT silently dropped), common-knowledge facts are pruned. Demands a bare `{"facts": [...]}` JSON object. Parser tries direct `json.loads` first, then a scoped regex (no greedy `\{.*\}`) before giving up.
+- **Output**: `MergeResult(success: bool, incorporated_indices: list[int])`. The revised fact list is written back as the node's full `data`; `incorporated_indices` tells the orchestrator which inputs survived as new lines (under NFKC + casefold matching) so consolidated-out facts aren't reported as "newly stored". Subsumes per-flush supersession, near-duplicate dedupe, and ongoing consolidation in a single call. Because the latest prompt rewrites the whole node, updated conventions propagate to old data without a separate migration step.
+- **Limits**: 20s timeout. **Hallucination guard**: rewrites with more than `len(existing) + len(new) + 2` lines are rejected as runaway output. Fail-open on any error, parse failure, oversized rewrite, or empty rewrite → caller falls back to plain `append_to_node` for each new fact so they still land (a contradiction is recoverable; a silent wipe or hallucinated bloat is not).
+
+## 12. Task-list Planner (pre-flight decomposition, gates the whole turn)
+
+- **File**: [src/jarvis/reply/planner.py](src/jarvis/reply/planner.py) — `plan_query()`.
+- **Trigger**: once per reply, **after the tool router and before memory search**. Skipped when `cfg.planner_enabled = False`, when the query is shorter than `MIN_QUERY_CHARS` (4), or when no model / base URL is available.
+- **Model / gating**: resolution chain `planner_model (override) → ollama_chat_model`. The planner tracks the chat model so upgrading the chat model (via setup wizard or config) automatically upgrades plan quality.
+- **Inputs**: user query, dialogue context, **router-narrowed** tool catalogue (names + one-line descriptions) — not the full 30+ list. When the carry-over guard from #7 fires, the previous turn's failed tool name is unioned into this catalogue before the planner sees it, so the planner can plan a re-call without `toolSearchTool` round-tripping. **No** memory context — the planner decides *whether* memory is needed.
+- **System prompt**: `_PROMPT_TEMPLATE` in `planner.py`. Teaches the `searchMemory topic='...'` directive for prior-conversation lookups, short imperative tool steps, angle-bracket entity placeholders, final synthesis step, same-language output, no numbering.
+- **Output**: list of plan steps (max `MAX_STEPS` = 5). Gates memory enrichment (#3 / #4) and augments the tool router (#7 — planner's picks are unioned in, not replacing). Single-step `["Reply to the user."]` plans are the planner's positive "no memory, no tools" signal. An empty list is fail-open — the engine reverts to running #3 unconditionally. Consumed further by the engine to build the `ACTION PLAN:` system-message block and drive the direct-exec loop (#13) for small models.
+- **Limits**: `planner_timeout_sec` (6s). Fail-open → `[]`.
+
+## 13. Plan Step Resolver (per direct-exec turn, small models)
+
+- **File**: [src/jarvis/reply/planner.py](src/jarvis/reply/planner.py) — `resolve_next_tool_call()`.
+- **Trigger**: top of each agentic-loop iteration when `use_text_tools` is True AND the plan from #12 still has unexecuted tool steps. Runs instead of the chat model for that turn. **Fast path skips the LLM entirely** when the step is fully concrete (tool name + `key='value'` args, no `<placeholder>`); the LLM call only fires when entity substitution or key remapping is needed.
+- **Model**: same chain as #12.
+- **Inputs**: next planned step text, prior tool calls (name + args + result excerpt), per-turn tool schema.
+- **System prompt**: `_STEP_RESOLVER_SYSTEM` at [planner.py:300](src/jarvis/reply/planner.py:300). Teaches one-JSON-object output, placeholder substitution from prior results, `null` for synthesis steps.
+- **Output**: `(tool_name, arguments)` tuple or `None`. Unknown tool names are rejected via the allow-list guard.
+- **Limits**: `planner_timeout_sec`. Fail-open → `None` (engine falls back to the chat-model turn).
+
+## 14. Tool-specific LLM calls
+
+- **Weather** ([src/jarvis/tools/builtin/weather.py](src/jarvis/tools/builtin/weather.py), ~line 60) — `ollama_chat_model`, parses location/time/unit from the query.
+- **Nutrition log_meal** ([src/jarvis/tools/builtin/nutrition/log_meal.py](src/jarvis/tools/builtin/nutrition/log_meal.py), lines 48 & 136) — `ollama_chat_model`, extracts nutrients, confirms logging.
+
+---
+
+## Frequency / Size Summary
+
+| # | Context | Per reply | Optional? | Model tier |
+|---|---------|-----------|-----------|------------|
+| 1 | Main chat loop | 1-8 | No | LARGE |
+| 2 | Intent judge | 1 (voice only) | fallback available | SMALL |
+| 3 | Memory enrichment extract | 0-1 | gated by planner | SMALL (via router chain) |
+| 4 | Memory digest | 0-N | auto by size | SMALL (uses chat model) |
+| 5 | Tool-result digest | 0-N | auto by size | SMALL (uses chat model) |
+| 6 | Max-turn digest | 0-1 | No | SMALL |
+| 7 | Tool router | 1 | always runs; planner picks unioned in | SMALL |
+| 8 | Tool searcher | 0-3 | model-initiated | SMALL (reuses #7) |
+| 9 | Summariser | ~1/session | No (background) | LARGE |
+| 10 | Graph extraction | ~1/session | No (background) | LARGE |
+| 11 | Graph best-child | 0-N | No (background) | SMALL (via router chain) |
+| 11b | Graph node merge | 0-N (per node, batched) | No (background) | SMALL (via router chain) |
+| 12 | Planner (plan_query) | 1 | yes (planner_enabled) | LARGE/SMALL (tracks chat model) |
+| 13 | Plan step resolver | 0-N (SMALL only) | auto by size + plan | SMALL (via router chain) |
+| 14 | Tool-specific | per-tool | n/a | LARGE |
+
+## Size-aware auto switches
+
+Driven by `detect_model_size(model_name) → SMALL (≤7B) | LARGE (8B+)`:
+
+| Feature | SMALL | LARGE |
+|---------|-------|-------|
+| Memory digest | ON | OFF |
+| Tool-result digest | ON | OFF |
+| Text-based tool calling | ON | OFF (native) |
+| Planner direct-exec | ON | OFF |
+
+## Config keys
+
+- Models: `ollama_chat_model`, `intent_judge_model`, `tool_router_model`
+- Flags: `memory_digest_enabled`, `tool_result_digest_enabled`, `llm_thinking_enabled`, `intent_judge_thinking_enabled`, `tool_selection_strategy`
+- Timeouts: `llm_chat_timeout_sec` (45s), `llm_digest_timeout_sec` (8s, shared across #4/#5/#6), `llm_tools_timeout_sec`, `intent_judge_timeout_sec` (15s)
+- Caps: `agentic_max_turns` (8), `tool_search_max_calls` (3), `_LLM_MAX_SELECTED` (5), `_DIGEST_MAX_CHARS` (400), `_TOOL_DIGEST_MAX_CHARS` (600)
+
+## Flow
+
+```
+user input
+  └─▶ [2] Intent Judge            (voice only, SMALL)
+        └─▶ [7] Tool router (narrows catalogue for the planner)
+              └─▶ [12] Planner (gates memory; advisory for the router allow-list)
+                    ├─ plan requests searchMemory  → [3] Enrichment extract → [4] Memory digest (optional)
+                    ├─ plan empty (fail-open)      → [3] Enrichment extract → [4] Memory digest
+                    └─ plan reply-only             → skip #3 and #4 entirely
+                    └─▶ AGENTIC LOOP  (≤ agentic_max_turns)
+                                      ├─ [13] Plan step resolver (SMALL, direct-exec)
+                                      ├─ [1] Main chat turn
+                                      ├─ tool execution
+                                      │    └─ [5] Tool-result digest (optional)
+                                      │    └─ [8] Tool searcher (model-initiated)
+                                      └─ content → deliver immediately
+                                      └─ if max turns → [6] Max-turn digest
+                          └─▶ TTS / output
+                          └─▶ background: [9] summariser → [10] graph extract → [11] best-child
+```
+
+## Optimisation ideas (seed list)
+
+1. Batch multi-chunk memory digests (#4) into a single call with explicit markers.
+2. Parallelise multiple tool-result digests (#5) when several results land at once.
+3. Pre-warm the intent-judge model before TTS finishes.
+4. Cache tool-router (#7) output by query hash.
+5. Give each digest its own timeout budget rather than sharing `llm_digest_timeout_sec` (today a slow memory digest can starve the max-turn digest).
+6. Consider single-model deployments: router+planner prefer `intent_judge_model`; loading a second model hurts cold-start latency on small hardware.
+7. Narrow `llm_thinking_enabled` to router/planner only, not every context.
+8. Reduce `intent_judge_timeout_sec` (15s) or race it against text-based wake detection to avoid blocking the audio loop.
+
+---
+
+## Measuring
+
+`tests/performance/test_pipeline_timings.py` times each context in this graph against a live Ollama. Run:
+
+```
+pytest tests/performance/ -v -m performance -s
+```
+
+It records per-context p50/p95 latencies using a monkey-patch recorder that infers the context from the caller's `__qualname__` (see `_CALLER_TO_CONTEXT` in `tests/performance/timing_recorder.py`). Dumps a JSON report to `tests/performance/reports/`. A micro-benchmark with a tiny fixed prompt runs alongside to give a per-call floor — if that floor moves, every context's total moves with it, so hardware/model drift is visible immediately.
+
+Baseline on a local gemma4:e2b (as of 2026-04-22, 3 queries × 3 runs): main chat turn p50 ~4.5s, enrichment extract p50 ~0.9s (small-model chain), micro-prompt floor ~0.15s. Sample sizes: main 25 calls, enrichment 9. Use these as rough reference points — the assertions in the test are relative-shape (router ≤ 1.5× main chat turn), not absolute.
+
+When you add or change a context, update `_CALLER_TO_CONTEXT` so it shows up in the report instead of landing in the `other:` bucket.
+
+## Keep this doc in sync
+
+This graph is the reference for LLM-latency optimisation. Treat it as authoritative: whenever code changes affect an LLM call — a new context, a removed one, a changed model/timeout/cap/gating/prompt source, or a new data-flow edge — update this file in the same PR. If the update would be more than a one-line tweak, reflect it in the relevant `*.spec.md` too.
--- a/docs/vnc-xfce-setup.md
+++ b/docs/vnc-xfce-setup.md
@@ -0,0 +1,98 @@
+# VM 106 (claude) — VNC + XFCE 원격 데스크톱 셋업 기록
+
+> Ubuntu 26.04 LTS / Proxmox VM 106 / RTX 5050 GPU 패스스루(연산 전용) 환경에서
+> 헤드리스(모니터 없음) 원격 데스크톱을 구성한 전체 과정과 함정 정리.
+> 용도: 크롬으로 웹 제어 + 디스코드 화면공유 (Javis 연동)
+
+---
+
+## 1. 최종 구성 요약
+
+| 항목 | 값 |
+|---|---|
+| VM | 106 (claude), IP `192.168.10.9` |
+| OS | Ubuntu 26.04 LTS (resolute) |
+| GPU | RTX 5050 패스스루, 연산 전용 (no x-vga), CUDA 13.2, driver 595.71.05 |
+| VNC 서버 | TigerVNC 1.15.0, 포트 `5901` |
+| 데스크톱 | XFCE |
+| 자동 시작 | `~/start-vnc.sh` + systemd user service + linger |
+| 접속 | VNC 뷰어로 `192.168.10.9:5901` (RDP 아님 / mstsc 안 됨) |
+
+---
+
+## 2. 접속 정보
+
+- **프로토콜**: VNC (RDP 아님 — 윈도우 mstsc로는 접속 불가)
+- **주소**: `192.168.10.9:5901`
+- **VNC 뷰어**: TigerVNC Viewer / RealVNC Viewer / MobaXterm 내장 VNC
+- **비밀번호**: `vncpasswd`로 설정한 8자 (VNC는 비번 8자 제한)
+
+---
+
+## 3. 핵심 함정 (이게 제일 중요)
+
+### 3-1. RDP(gnome-remote-desktop)는 포기 → VNC로 전환
+- 시스템 모드 `grdctl --system`에서 자격증명 키링 저장 실패 (TPM 없음 → GKeyFile 폴백 깨짐)
+- `Credentials are not set, denying client` 로 접속 거부 → TigerVNC로 전환
+
+### 3-2. GPU 패스스루 환경 → render/video 그룹 필수
+- `claude` 사용자가 `render`, `video` 그룹에 없으면 Xvnc가 `/dev/dri` 접근 실패로 X 서버 즉시 크래시
+- 증상: `libEGL warning: failed to open /dev/dri/card0: Permission denied`, `X connection to :1 broken`
+- 해결: `sudo usermod -aG render,video claude` (그룹 추가 후 재로그인/재부팅 필요)
+
+### 3-3. startxfce4 대신 xfce4-session 직접 호출
+- `startxfce4`는 X 서버가 이미 떠 있으면 그냥 종료됨 → xstartup에서 `xfce4-session` 직접 호출
+
+### 3-4. 메뉴/패널이 비면 → RENDER 확장 켜기 + XDG 환경변수
+- `-extension RENDER`를 넣으면 XFCE 메뉴/패널이 공백으로 나옴 → 이 환경에선 RENDER 켜는 게 정답
+- systemd 서비스 환경엔 `XDG_DATA_DIRS`, `XDG_CONFIG_DIRS`를 명시
+
+### 3-5. 설정 손상 시 초기화
+- `mv ~/.config/xfce4 ~/.config/xfce4.broken && mv ~/.cache/xfce4 ~/.cache/xfce4.broken` 후 재시작
+
+### 3-6. systemctl --user는 XDG_RUNTIME_DIR 필요
+- `export XDG_RUNTIME_DIR=/run/user/$(id -u)`
+
+---
+
+## 4. 설치 패키지
+
+```bash
+sudo apt install -y tigervnc-standalone-server tigervnc-common
+sudo apt install -y xfce4 xfce4-goodies dbus-x11
+sudo apt install -y fonts-noto-cjk fonts-noto-cjk-extra fonts-nanum
+cd /tmp && wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
+sudo apt install -y ./google-chrome-stable_current_amd64.deb
+```
+
+---
+
+## 5. 자동 시작 (`~/start-vnc.sh`)
+
+```bash
+#!/bin/bash
+export DISPLAY=:1
+export XDG_RUNTIME_DIR=/run/user/$(id -u)
+export HOME=/home/claude
+export XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
+export XDG_CONFIG_DIRS=/etc/xdg
+pkill -9 -u $(id -u) Xvnc 2>/dev/null
+sleep 2
+# 주의: -extension RENDER 넣지 말 것 (메뉴/패널이 안 그려짐)
+/usr/bin/Xvnc :1 -geometry 1920x1080 -depth 24 -rfbport 5901 \
+  -rfbauth $HOME/.config/tigervnc/passwd -SecurityTypes VncAuth -localhost no &
+sleep 5
+exec dbus-launch --exit-with-session xfce4-session
+```
+
+systemd user service + linger로 부팅 시 자동 시작.
+
+---
+
+## 6. Javis 연동 시 핵심 포인트
+
+- 봇/브릿지는 디스플레이 **:1** 에서 동작하는 X 화면을 사용합니다 (`VNC_DISPLAY=:1`).
+- 크롬 제어: `DISPLAY=:1 google-chrome --password-store=basic --no-first-run`
+- 화면 송출(셀프봇/스크린샷)은 ffmpeg `x11grab`으로 `:1`을 캡처합니다.
+- noVNC를 쓰려면: `websockify --web=/usr/share/novnc 6080 localhost:5901` 후
+  `.env`의 `NOVNC_URL=http://192.168.10.9:6080/vnc.html`.