Add Discord-native hybrid front-end for Jarvis (bot + bridge)

Transform isair/jarvis into a Discord-controlled voice assistant running on the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact. - bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral), voice channel join + voice receive/playback, pluggable VNC screen broadcast (selfbot live / noVNC / screenshot) - bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS behind a thin localhost HTTP API - .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite, docs/language-comparison.md and docs/vnc-xfce-setup.md Language decision: hybrid (Python brain + Node/bun Discord layer) because Discord blocks bot video; native screen broadcast only works via a Node selfbot library.
2026-06-09 14:51:05 +09:00
parent a5bf8d1826
commit c4abf63f38
308 changed files with 94135 additions and 1 deletions
--- a/.claude/launch.json
+++ b/.claude/launch.json
@@ -0,0 +1,29 @@
+{
+  "version": "0.0.1",
+  "configurations": [
+    {
+      "name": "Desktop App",
+      "runtimeExecutable": "python",
+      "runtimeArgs": ["scripts/launch.py", "run_desktop_app"],
+      "port": 5050
+    },
+    {
+      "name": "Desktop App (Voice Debug)",
+      "runtimeExecutable": "python",
+      "runtimeArgs": ["scripts/launch.py", "run_desktop_app", "--voice-debug"],
+      "port": 5050
+    },
+    {
+      "name": "Evals",
+      "runtimeExecutable": "python",
+      "runtimeArgs": ["scripts/launch.py", "run_evals"],
+      "port": null
+    },
+    {
+      "name": "Build Installer",
+      "runtimeExecutable": "python",
+      "runtimeArgs": ["scripts/launch.py", "build_installer"],
+      "port": null
+    }
+  ]
+}
--- a/.claude/skills/review-pr/SKILL.md
+++ b/.claude/skills/review-pr/SKILL.md
@@ -0,0 +1,178 @@
+---
+name: review-pr
+description: >
+  Multi-agent adversarial PR review. Spawns parallel specialist agents
+  (correctness, security, performance, maintainability, completeness) then
+  a verifier agent that challenges every finding. Only verified issues survive.
+  Accepts an optional PR number or URL; defaults to the current branch's open PR.
+argument-hint: "[PR number or URL]"
+---
+
+# Multi-Agent Adversarial PR Review
+
+You are an orchestrator for a thorough, multi-perspective pull request review.
+Your job is to gather PR context, spawn specialist review agents in parallel,
+then run a verification pass to filter out false positives.
+
+## Step 1 — Gather PR Context
+
+Determine the PR to review:
+- If `$ARGUMENTS` is provided, use it (a PR number, URL, or branch name).
+- Otherwise, detect the current branch and find its open PR.
+
+Use the GitHub MCP tools (or `gh` CLI if MCP is unavailable) to fetch:
+1. **PR metadata**: title, body, author, base branch, labels
+2. **Full diff**: the complete code diff
+3. **Changed file list**: just the filenames for targeted exploration
+4. **PR comments/reviews**: any existing review feedback
+5. **CI status**: check if CI is passing or failing
+
+Also read the project's `CLAUDE.md` for coding conventions the review should enforce.
+
+Store all this context — you will include it in each specialist agent's prompt.
+
+## Step 2 — Spawn Specialist Agents (Parallel)
+
+Launch **all five** specialist agents simultaneously using the Agent tool.
+Each agent receives the full diff, changed file list, PR description, and
+project conventions. Each must output a structured list of findings.
+
+### Agent 1: Correctness Reviewer
+Focus: Logic bugs, edge cases, regressions.
+- Off-by-one errors, null/undefined handling, race conditions
+- Broken invariants, incorrect control flow
+- State management issues (missing assignments, leaked state)
+- Regressions: does this change break existing behaviour?
+- Read surrounding code (not just the diff) to understand context
+
+### Agent 2: Security Reviewer
+Focus: Vulnerabilities and unsafe patterns.
+- Injection (SQL, command, XSS, path traversal)
+- Authentication/authorisation bypass
+- Secrets or credentials in code
+- Unsafe deserialisation, SSRF, open redirects
+- Cryptographic misuse, insecure randomness
+- Dependency vulnerabilities (if new deps added)
+
+### Agent 3: Performance Reviewer
+Focus: Efficiency and scalability.
+- N+1 queries, unnecessary allocations, missing caching
+- O(n²) or worse algorithms where linear is possible
+- Blocking calls in async/event-loop contexts
+- Memory leaks, unbounded growth (queues, buffers, caches)
+- Unnecessary I/O, redundant network calls
+
+### Agent 4: Maintainability Reviewer
+Focus: Design quality and readability.
+- SOLID principle violations, excessive coupling
+- Code duplication (DRY violations)
+- Naming clarity (variables, functions, classes)
+- Missing or misleading comments/docstrings
+- Overly complex logic that could be simplified
+- Inconsistency with project conventions (from CLAUDE.md)
+
+### Agent 5: Completeness Reviewer
+Focus: What's missing.
+- Missing test coverage for new/changed code paths
+- Missing error handling for failure modes
+- Undocumented behaviour changes (README, specs, CHANGELOG)
+- Spec drift: do changes contradict any spec files?
+- Missing migration steps or configuration updates
+- Edge cases not addressed in the implementation
+
+### Agent Prompt Template
+
+Each agent's prompt MUST include:
+1. The full diff
+2. The changed file list
+3. The PR description
+4. Relevant project conventions from CLAUDE.md
+5. Instruction to READ the surrounding code in changed files (not just the diff lines) for full context
+6. Instruction to output findings as a structured list:
+
+```
+For each finding, output:
+- **File**: path/to/file.py:LINE
+- **Severity**: critical / high / medium / low
+- **Category**: bug / security / performance / design / missing
+- **Confidence**: high / medium / low
+- **Description**: What the issue is and why it matters
+- **Suggestion**: Concrete fix or alternative approach
+```
+
+7. Instruction: if no issues found in your area, explicitly state "No issues found" — do not invent findings to appear thorough.
+8. Instruction: only report issues with confidence >= medium. Do not report style nits unless they violate project conventions.
+
+## Step 3 — Verification Phase (Adversarial)
+
+After ALL specialist agents complete, spawn a single **Verifier Agent** that
+receives every finding from all specialists. The verifier's job is to
+**challenge and disprove** each finding:
+
+### Verifier Agent Instructions
+
+You are a devil's advocate. For EACH finding from the specialist reviewers:
+
+1. **Read the actual code** (not just the diff) — the "bug" may be handled
+   elsewhere in the codebase.
+2. **Check if the concern is mitigated** by framework defaults, type system
+   guarantees, or existing validation.
+3. **Verify the severity** — is this really critical, or is it a cosmetic issue
+   dressed up as a bug?
+4. **Check for duplicates** — multiple specialists may report the same issue
+   in different words.
+5. **Assess confidence** — is the specialist making assumptions about runtime
+   behaviour without evidence?
+
+For each finding, output one of:
+- **VERIFIED** — the issue is real and correctly categorised
+- **DOWNGRADED** — the issue exists but severity/confidence should be lower (explain why)
+- **DISMISSED** — the issue is a false positive (explain why)
+- **DUPLICATE** — already covered by another finding (reference which one)
+
+## Step 4 — Synthesise Final Report
+
+Collect all VERIFIED and DOWNGRADED findings. Produce a final review report:
+
+### Report Format
+
+```markdown
+## PR Review: <PR title>
+
+### Summary
+<2-3 sentence overview of the PR and overall assessment>
+
+### Critical / High Issues
+<Only VERIFIED findings with severity critical or high>
+
+### Medium Issues
+<VERIFIED findings with severity medium>
+
+### Suggestions
+<DOWNGRADED findings and low-severity items, briefly>
+
+### What Looks Good
+<Positive observations — good patterns, thorough tests, clean design>
+
+### Verdict
+<One of: APPROVE / REQUEST_CHANGES / COMMENT>
+<Brief justification>
+```
+
+### Rules for the Final Report
+- Lead with the most important issues
+- Be specific: include file paths, line numbers, and code snippets
+- Be constructive: every criticism must include a concrete suggestion
+- Acknowledge what's done well — reviews should be balanced
+- If no critical/high issues exist, lean towards APPROVE
+- Use the project's conventions (British English, emojis for emphasis)
+
+## Important Guidelines
+
+- **Do NOT make changes to code** — this is a read-only review
+- **Do NOT post the review to GitHub** unless explicitly asked
+- **Be thorough but not noisy** — quality over quantity
+- **Respect the author's intent** — understand why before criticising what
+- Each specialist agent should use `subagent_type: "Explore"` for efficient codebase reading
+- The verifier agent should use `subagent_type: "general-purpose"` for deeper reasoning
+- When spawning agents, always include the full diff and context in the prompt — agents have no memory of this conversation
--- a/.claude/skills/triage/SKILL.md
+++ b/.claude/skills/triage/SKILL.md
@@ -0,0 +1,176 @@
+---
+name: triage
+description: >
+  Triage open GitHub issues and discussions on the Jarvis repo. Sweep for
+  untriaged reports, reply to awaiting-user threads when new info lands,
+  apply the right labels, close duplicates, and edit past owner comments
+  rather than stacking follow-ups. Use after a release or any time the user
+  says "triage issues", "triage discussions", or similar.
+---
+
+# Triage Skill
+
+You are triaging open issues and discussions on `isair/jarvis`. Work from data,
+not memory. Stay friendly, specific, and short.
+
+## Step 1. Pull the state
+
+Run these as parallel Bash tool calls (one message, two tool uses), not as chained shell commands:
+
+```bash
+gh issue list --state open --limit 50 --json number,title,author,createdAt,updatedAt,labels,comments \
+  --jq '[.[] | {number, title, author: .author.login, labels: [.labels[].name], commentCount: (.comments|length), updatedAt}]'
+```
+
+```bash
+gh api graphql -f query='{repository(owner:"isair",name:"jarvis"){discussions(first:30,states:OPEN,orderBy:{field:UPDATED_AT,direction:DESC}){nodes{id number title author{login} category{name} updatedAt comments(last:5){totalCount nodes{id author{login} createdAt body replies(last:10){nodes{id author{login} createdAt body}}}}}}}}' \
+  --jq '.data.repository.discussions.nodes'
+```
+
+**Important**: GitHub Discussions are threaded. The top-level `comments` list does
+not include sub-replies, so a fresh reporter question that lives under an owner
+comment will look like an unanswered top-level thread if you forget to fetch
+`replies`. The query above pulls both. When deciding "untriaged" vs "awaiting
+reporter", scan the **last reply across the whole tree**, not just the last
+top-level comment. A common shape: owner answers at the top level, reporter
+replies underneath, owner replies underneath that. The newest message is two
+levels deep, and you'll miss it if you only look at the top-level list.
+
+Classify each thread into one of:
+
+- **Untriaged**: no owner (`isair`) reply yet. Act now.
+- **Awaiting reporter**: labelled `question` or the last comment is from the owner asking for details. Leave it unless the reporter has replied with new info. Per repo policy, do not close for silence before 2 weeks of reporter inactivity.
+- **Owner tracking**: filed by `isair` as an internal task. Skip unless a non-owner has commented with a question or new information, in which case treat it like a normal untriaged thread.
+- **Resolved-pending-release**: fix is on `develop`. Never close manually. Release (`git merge --ff-only develop` → `main`) auto-closes via `Closes #NNN`. Detect this by scanning recent `develop` commits (`gh pr list --base develop --state merged --limit 20`) for references to the issue number before you reply, so you can tell the reporter "this is fixed in the next release" rather than asking for more info.
+
+## Step 2. Fetch details for the untriaged
+
+For issues:
+
+```bash
+gh issue view <N> --json title,body,author,labels,comments \
+  --jq '{title, author: .author.login, labels: [.labels[].name], body, comments: [.comments[] | {author: .author.login, createdAt, body}]}'
+```
+
+Read the **logs** and traceback carefully before replying. The vast majority of
+reports contain the answer in the log; the reporter just didn't know what to
+look for.
+
+## Step 3. Diagnose from the log
+
+Common Jarvis patterns and what they mean:
+
+| Symptom in log | Likely cause | Ask for |
+|----------------|--------------|---------|
+| Repeated `📝 Heard: "Thank you."`, `"you..."`, `"Thanks for watching!"` with no real commands | Whisper hallucinations on near-silent audio. Wrong default mic or broken mic/driver. | Ask them to check the input level bar (Windows Sound settings, or macOS System Settings → Sound → Input) actually moves when they speak, and confirm which mic they intend to use. |
+| `🧠 Intent judge: unavailable (timeout or error)` | Known; improved in v1.25.1 (bump this version as newer fixes ship). | Version they're on, and retry on latest. |
+| `huggingface_hub.snapshot_download` crash (thread pool / ssl.create_default_context) | Download-time crash, platform-specific. Not the same as 429 throttling. | Keep open as its own bug. Workaround: manual `ollama pull ...` and relaunch. |
+| `LLM connection error: ... RemoteDisconnected` | Ollama dropped. Upstream, not Jarvis. | `ollama run <model>` health check; Ollama version. |
+| `setup_wizard.py ... _install_next_model` fatal | Real bug on our side. | Which model had just finished, which was about to start; `ollama list` after crash; `~/Library/Logs/DiagnosticReports/Jarvis-*.ips` on macOS. |
+| `Low confidence` lines only, no `Heard:` ever | Mic is captured but utterances are under the confidence floor. Usually mic placement or wrong device. | Same as first row. |
+| `📍 Location features are not available` | Not an error. Location is optional and only affects weather / local-time context. | Reassure, don't diagnose. Point at the MaxMind GeoLite2 signup if they actually want it. |
+
+**Do not ask obviously-answered questions.** If the log shows the wizard was
+pulling models, Ollama is by definition installed and running. If the log shows
+Whisper loaded, Whisper is installed. Read before asking.
+
+Other recurring user-environment answers:
+
+- **Windows "Error 4551: Application Control policy has blocked this file"**: WDAC / AppLocker / corporate MDM, not Jarvis. Point at IT allow-listing, `secpol.msc`, or install-from-source.
+- **"missing AI models"**: `ollama pull gemma4:e2b` + `ollama pull nomic-embed-text`, or tray → 🔧 Setup Wizard.
+- **Setup wizard was closed early, nothing works**: tray → 🔧 Setup Wizard reopens it. Fallback: `rm -rf ~/.config/jarvis ~/.local/share/jarvis/config`.
+- **`gemma4:e2b` quality complaints**: it is a very small model. Suggest 7B+ if hardware allows, note that capability scales with model size.
+- **"Can Jarvis speak <language>?"**: yes if the chat model supports it; for voice, Whisper handles most languages. Point at README.
+
+## Step 4. Label, retitle, reply
+
+Available labels: `bug`, `question`, `duplicate`, `enhancement`, `documentation`, `good first issue`, `help wanted`, `invalid`, `wontfix`, `voice`, `spike`.
+
+Conventions:
+
+- Empty-body or needs-info bug reports: label `bug,question`, retitle to `"<one-line symptom> (awaiting details)"` or similar so the backlog is scannable.
+- Duplicates: label `duplicate`, leave one short comment pointing at the canonical issue, close with `--reason "not planned"`.
+- Real confirmed crashes: label `bug` (and `voice` if audio-related), retitle to pin the failure site from the traceback (e.g. `"Crash on first-run setup wizard during model install (macOS, v1.26.0)"`).
+
+Reply tone:
+
+- Open with `Hi @user, thanks for filing this! 👋`
+- State the diagnosis (what the log shows) before the asks.
+- Use bullet lists with **bold labels** for asks. Keep to 3 to 5 asks max.
+- Friendly emojis: 👋 🙏 🚀 🧠 🎤 🔊 📝.
+- **No em dashes (—) anywhere in user-facing writing.** Use commas, full stops, colons, or parentheses.
+- **British English** (colour, behaviour, initialise).
+- Do not promise fixes or ETAs.
+
+## Step 5. Post the reply
+
+Issue comment:
+
+```bash
+gh issue comment <N> --body "..."
+gh issue edit <N> --add-label "bug,question" --title "..."
+gh issue close <N> --reason "not planned"   # duplicates / wontfix only
+```
+
+Discussion comment (GraphQL, and **use `-f body=` not `-F body=`** if the body
+starts with `@`, because `gh` treats `-F` values starting with `@` as file
+paths):
+
+```bash
+gh api graphql -f query='mutation($id:ID!,$body:String!){addDiscussionComment(input:{discussionId:$id,body:$body}){comment{url}}}' \
+  -F id=<discussion node id> -f body="@user, ..."
+```
+
+Get the discussion `id` field from the Step 1 GraphQL output. It's the outer `id` on the discussion node, not the inner `id` inside `comments.nodes` (that one is the comment's node id, used in Step 6 for edits).
+
+**Verify the node id before posting.** Discussion node ids look like `D_kwDOPgt_k84Albb5` and a single-character typo will silently route the comment to a completely unrelated repo's discussion (the prefix encodes the repo, but neighbouring ids belong to other repos). Two safeguards:
+
+1. Copy the id straight from the Step 1 output, never retype it.
+2. The mutation response returns the comment URL: `addDiscussionComment.comment.url`. Inspect it. If the host path is anything other than `github.com/isair/jarvis/discussions/<N>`, you posted to the wrong repo. Delete the comment immediately:
+   ```bash
+   gh api graphql -f query='mutation($id:ID!){deleteDiscussionComment(input:{id:$id}){comment{id}}}' -F id=<comment node id>
+   ```
+   Then repost with the correct discussion id.
+
+To reply to a specific comment (threaded sub-reply) rather than at the top level, pass `replyToId` in the mutation input. Otherwise the reply goes to the root.
+
+If a `body` you want to post starts with `@`, use `-f body="..."`, not `-F body="..."`. `gh` interprets `-F` values starting with `@` as file paths.
+
+## Step 6. Clean up your own past comments
+
+If a previous owner comment was premature, wrong, or asked an
+obviously-answered question, **edit it in place**. A clean thread beats a trail
+of self-corrections.
+
+Issue comment edit:
+
+```bash
+gh api -X PATCH repos/isair/jarvis/issues/comments/<commentId> -f body="..."
+```
+
+Discussion comment edit. First grab the comment node id (the `last:5` window usually covers recent owner replies):
+
+```bash
+gh api graphql -f query='{repository(owner:"isair",name:"jarvis"){discussion(number:N){comments(last:5){nodes{id author{login} createdAt body}}}}}'
+```
+
+Then update it:
+
+```bash
+gh api graphql -f query='mutation($id:ID!,$body:String!){updateDiscussionComment(input:{commentId:$id,body:$body}){comment{url}}}' \
+  -F id=<comment node id> -f body="..."
+```
+
+## Step 7. Summarise to the user
+
+At the end, list what you touched per thread: labels changed, titles changed,
+comments posted, closures. Use markdown links like `[#241](https://github.com/isair/jarvis/issues/241)`. Keep it short.
+
+## Hard rules
+
+- Never close an issue because its fix landed on `develop`. Let the release auto-close.
+- Never close for reporter silence under 2 weeks after a clarifying question.
+- Never ask a question the log already answers.
+- Never use em dashes in user-facing text.
+- Never invent facts about a reporter's environment. Ask, or infer only from the log.
+- When in doubt, label `question` and ask rather than guess.