Add Discord-native hybrid front-end for Jarvis (bot + bridge)
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled

Transform isair/jarvis into a Discord-controlled voice assistant running on
the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact.

- bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral),
  voice channel join + voice receive/playback, pluggable VNC screen broadcast
  (selfbot live / noVNC / screenshot)
- bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS
  behind a thin localhost HTTP API
- .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite,
  docs/language-comparison.md and docs/vnc-xfce-setup.md

Language decision: hybrid (Python brain + Node/bun Discord layer) because
Discord blocks bot video; native screen broadcast only works via a Node
selfbot library.
This commit is contained in:
javis-bot
2026-06-09 14:51:05 +09:00
parent a5bf8d1826
commit c4abf63f38
308 changed files with 94135 additions and 1 deletions

29
.claude/launch.json Normal file
View File

@@ -0,0 +1,29 @@
{
"version": "0.0.1",
"configurations": [
{
"name": "Desktop App",
"runtimeExecutable": "python",
"runtimeArgs": ["scripts/launch.py", "run_desktop_app"],
"port": 5050
},
{
"name": "Desktop App (Voice Debug)",
"runtimeExecutable": "python",
"runtimeArgs": ["scripts/launch.py", "run_desktop_app", "--voice-debug"],
"port": 5050
},
{
"name": "Evals",
"runtimeExecutable": "python",
"runtimeArgs": ["scripts/launch.py", "run_evals"],
"port": null
},
{
"name": "Build Installer",
"runtimeExecutable": "python",
"runtimeArgs": ["scripts/launch.py", "build_installer"],
"port": null
}
]
}

View File

@@ -0,0 +1,178 @@
---
name: review-pr
description: >
Multi-agent adversarial PR review. Spawns parallel specialist agents
(correctness, security, performance, maintainability, completeness) then
a verifier agent that challenges every finding. Only verified issues survive.
Accepts an optional PR number or URL; defaults to the current branch's open PR.
argument-hint: "[PR number or URL]"
---
# Multi-Agent Adversarial PR Review
You are an orchestrator for a thorough, multi-perspective pull request review.
Your job is to gather PR context, spawn specialist review agents in parallel,
then run a verification pass to filter out false positives.
## Step 1 — Gather PR Context
Determine the PR to review:
- If `$ARGUMENTS` is provided, use it (a PR number, URL, or branch name).
- Otherwise, detect the current branch and find its open PR.
Use the GitHub MCP tools (or `gh` CLI if MCP is unavailable) to fetch:
1. **PR metadata**: title, body, author, base branch, labels
2. **Full diff**: the complete code diff
3. **Changed file list**: just the filenames for targeted exploration
4. **PR comments/reviews**: any existing review feedback
5. **CI status**: check if CI is passing or failing
Also read the project's `CLAUDE.md` for coding conventions the review should enforce.
Store all this context — you will include it in each specialist agent's prompt.
## Step 2 — Spawn Specialist Agents (Parallel)
Launch **all five** specialist agents simultaneously using the Agent tool.
Each agent receives the full diff, changed file list, PR description, and
project conventions. Each must output a structured list of findings.
### Agent 1: Correctness Reviewer
Focus: Logic bugs, edge cases, regressions.
- Off-by-one errors, null/undefined handling, race conditions
- Broken invariants, incorrect control flow
- State management issues (missing assignments, leaked state)
- Regressions: does this change break existing behaviour?
- Read surrounding code (not just the diff) to understand context
### Agent 2: Security Reviewer
Focus: Vulnerabilities and unsafe patterns.
- Injection (SQL, command, XSS, path traversal)
- Authentication/authorisation bypass
- Secrets or credentials in code
- Unsafe deserialisation, SSRF, open redirects
- Cryptographic misuse, insecure randomness
- Dependency vulnerabilities (if new deps added)
### Agent 3: Performance Reviewer
Focus: Efficiency and scalability.
- N+1 queries, unnecessary allocations, missing caching
- O(n²) or worse algorithms where linear is possible
- Blocking calls in async/event-loop contexts
- Memory leaks, unbounded growth (queues, buffers, caches)
- Unnecessary I/O, redundant network calls
### Agent 4: Maintainability Reviewer
Focus: Design quality and readability.
- SOLID principle violations, excessive coupling
- Code duplication (DRY violations)
- Naming clarity (variables, functions, classes)
- Missing or misleading comments/docstrings
- Overly complex logic that could be simplified
- Inconsistency with project conventions (from CLAUDE.md)
### Agent 5: Completeness Reviewer
Focus: What's missing.
- Missing test coverage for new/changed code paths
- Missing error handling for failure modes
- Undocumented behaviour changes (README, specs, CHANGELOG)
- Spec drift: do changes contradict any spec files?
- Missing migration steps or configuration updates
- Edge cases not addressed in the implementation
### Agent Prompt Template
Each agent's prompt MUST include:
1. The full diff
2. The changed file list
3. The PR description
4. Relevant project conventions from CLAUDE.md
5. Instruction to READ the surrounding code in changed files (not just the diff lines) for full context
6. Instruction to output findings as a structured list:
```
For each finding, output:
- **File**: path/to/file.py:LINE
- **Severity**: critical / high / medium / low
- **Category**: bug / security / performance / design / missing
- **Confidence**: high / medium / low
- **Description**: What the issue is and why it matters
- **Suggestion**: Concrete fix or alternative approach
```
7. Instruction: if no issues found in your area, explicitly state "No issues found" — do not invent findings to appear thorough.
8. Instruction: only report issues with confidence >= medium. Do not report style nits unless they violate project conventions.
## Step 3 — Verification Phase (Adversarial)
After ALL specialist agents complete, spawn a single **Verifier Agent** that
receives every finding from all specialists. The verifier's job is to
**challenge and disprove** each finding:
### Verifier Agent Instructions
You are a devil's advocate. For EACH finding from the specialist reviewers:
1. **Read the actual code** (not just the diff) — the "bug" may be handled
elsewhere in the codebase.
2. **Check if the concern is mitigated** by framework defaults, type system
guarantees, or existing validation.
3. **Verify the severity** — is this really critical, or is it a cosmetic issue
dressed up as a bug?
4. **Check for duplicates** — multiple specialists may report the same issue
in different words.
5. **Assess confidence** — is the specialist making assumptions about runtime
behaviour without evidence?
For each finding, output one of:
- **VERIFIED** — the issue is real and correctly categorised
- **DOWNGRADED** — the issue exists but severity/confidence should be lower (explain why)
- **DISMISSED** — the issue is a false positive (explain why)
- **DUPLICATE** — already covered by another finding (reference which one)
## Step 4 — Synthesise Final Report
Collect all VERIFIED and DOWNGRADED findings. Produce a final review report:
### Report Format
```markdown
## PR Review: <PR title>
### Summary
<2-3 sentence overview of the PR and overall assessment>
### Critical / High Issues
<Only VERIFIED findings with severity critical or high>
### Medium Issues
<VERIFIED findings with severity medium>
### Suggestions
<DOWNGRADED findings and low-severity items, briefly>
### What Looks Good
<Positive observations — good patterns, thorough tests, clean design>
### Verdict
<One of: APPROVE / REQUEST_CHANGES / COMMENT>
<Brief justification>
```
### Rules for the Final Report
- Lead with the most important issues
- Be specific: include file paths, line numbers, and code snippets
- Be constructive: every criticism must include a concrete suggestion
- Acknowledge what's done well — reviews should be balanced
- If no critical/high issues exist, lean towards APPROVE
- Use the project's conventions (British English, emojis for emphasis)
## Important Guidelines
- **Do NOT make changes to code** — this is a read-only review
- **Do NOT post the review to GitHub** unless explicitly asked
- **Be thorough but not noisy** — quality over quantity
- **Respect the author's intent** — understand why before criticising what
- Each specialist agent should use `subagent_type: "Explore"` for efficient codebase reading
- The verifier agent should use `subagent_type: "general-purpose"` for deeper reasoning
- When spawning agents, always include the full diff and context in the prompt — agents have no memory of this conversation

View File

@@ -0,0 +1,176 @@
---
name: triage
description: >
Triage open GitHub issues and discussions on the Jarvis repo. Sweep for
untriaged reports, reply to awaiting-user threads when new info lands,
apply the right labels, close duplicates, and edit past owner comments
rather than stacking follow-ups. Use after a release or any time the user
says "triage issues", "triage discussions", or similar.
---
# Triage Skill
You are triaging open issues and discussions on `isair/jarvis`. Work from data,
not memory. Stay friendly, specific, and short.
## Step 1. Pull the state
Run these as parallel Bash tool calls (one message, two tool uses), not as chained shell commands:
```bash
gh issue list --state open --limit 50 --json number,title,author,createdAt,updatedAt,labels,comments \
--jq '[.[] | {number, title, author: .author.login, labels: [.labels[].name], commentCount: (.comments|length), updatedAt}]'
```
```bash
gh api graphql -f query='{repository(owner:"isair",name:"jarvis"){discussions(first:30,states:OPEN,orderBy:{field:UPDATED_AT,direction:DESC}){nodes{id number title author{login} category{name} updatedAt comments(last:5){totalCount nodes{id author{login} createdAt body replies(last:10){nodes{id author{login} createdAt body}}}}}}}}' \
--jq '.data.repository.discussions.nodes'
```
**Important**: GitHub Discussions are threaded. The top-level `comments` list does
not include sub-replies, so a fresh reporter question that lives under an owner
comment will look like an unanswered top-level thread if you forget to fetch
`replies`. The query above pulls both. When deciding "untriaged" vs "awaiting
reporter", scan the **last reply across the whole tree**, not just the last
top-level comment. A common shape: owner answers at the top level, reporter
replies underneath, owner replies underneath that. The newest message is two
levels deep, and you'll miss it if you only look at the top-level list.
Classify each thread into one of:
- **Untriaged**: no owner (`isair`) reply yet. Act now.
- **Awaiting reporter**: labelled `question` or the last comment is from the owner asking for details. Leave it unless the reporter has replied with new info. Per repo policy, do not close for silence before 2 weeks of reporter inactivity.
- **Owner tracking**: filed by `isair` as an internal task. Skip unless a non-owner has commented with a question or new information, in which case treat it like a normal untriaged thread.
- **Resolved-pending-release**: fix is on `develop`. Never close manually. Release (`git merge --ff-only develop``main`) auto-closes via `Closes #NNN`. Detect this by scanning recent `develop` commits (`gh pr list --base develop --state merged --limit 20`) for references to the issue number before you reply, so you can tell the reporter "this is fixed in the next release" rather than asking for more info.
## Step 2. Fetch details for the untriaged
For issues:
```bash
gh issue view <N> --json title,body,author,labels,comments \
--jq '{title, author: .author.login, labels: [.labels[].name], body, comments: [.comments[] | {author: .author.login, createdAt, body}]}'
```
Read the **logs** and traceback carefully before replying. The vast majority of
reports contain the answer in the log; the reporter just didn't know what to
look for.
## Step 3. Diagnose from the log
Common Jarvis patterns and what they mean:
| Symptom in log | Likely cause | Ask for |
|----------------|--------------|---------|
| Repeated `📝 Heard: "Thank you."`, `"you..."`, `"Thanks for watching!"` with no real commands | Whisper hallucinations on near-silent audio. Wrong default mic or broken mic/driver. | Ask them to check the input level bar (Windows Sound settings, or macOS System Settings → Sound → Input) actually moves when they speak, and confirm which mic they intend to use. |
| `🧠 Intent judge: unavailable (timeout or error)` | Known; improved in v1.25.1 (bump this version as newer fixes ship). | Version they're on, and retry on latest. |
| `huggingface_hub.snapshot_download` crash (thread pool / ssl.create_default_context) | Download-time crash, platform-specific. Not the same as 429 throttling. | Keep open as its own bug. Workaround: manual `ollama pull ...` and relaunch. |
| `LLM connection error: ... RemoteDisconnected` | Ollama dropped. Upstream, not Jarvis. | `ollama run <model>` health check; Ollama version. |
| `setup_wizard.py ... _install_next_model` fatal | Real bug on our side. | Which model had just finished, which was about to start; `ollama list` after crash; `~/Library/Logs/DiagnosticReports/Jarvis-*.ips` on macOS. |
| `Low confidence` lines only, no `Heard:` ever | Mic is captured but utterances are under the confidence floor. Usually mic placement or wrong device. | Same as first row. |
| `📍 Location features are not available` | Not an error. Location is optional and only affects weather / local-time context. | Reassure, don't diagnose. Point at the MaxMind GeoLite2 signup if they actually want it. |
**Do not ask obviously-answered questions.** If the log shows the wizard was
pulling models, Ollama is by definition installed and running. If the log shows
Whisper loaded, Whisper is installed. Read before asking.
Other recurring user-environment answers:
- **Windows "Error 4551: Application Control policy has blocked this file"**: WDAC / AppLocker / corporate MDM, not Jarvis. Point at IT allow-listing, `secpol.msc`, or install-from-source.
- **"missing AI models"**: `ollama pull gemma4:e2b` + `ollama pull nomic-embed-text`, or tray → 🔧 Setup Wizard.
- **Setup wizard was closed early, nothing works**: tray → 🔧 Setup Wizard reopens it. Fallback: `rm -rf ~/.config/jarvis ~/.local/share/jarvis/config`.
- **`gemma4:e2b` quality complaints**: it is a very small model. Suggest 7B+ if hardware allows, note that capability scales with model size.
- **"Can Jarvis speak <language>?"**: yes if the chat model supports it; for voice, Whisper handles most languages. Point at README.
## Step 4. Label, retitle, reply
Available labels: `bug`, `question`, `duplicate`, `enhancement`, `documentation`, `good first issue`, `help wanted`, `invalid`, `wontfix`, `voice`, `spike`.
Conventions:
- Empty-body or needs-info bug reports: label `bug,question`, retitle to `"<one-line symptom> (awaiting details)"` or similar so the backlog is scannable.
- Duplicates: label `duplicate`, leave one short comment pointing at the canonical issue, close with `--reason "not planned"`.
- Real confirmed crashes: label `bug` (and `voice` if audio-related), retitle to pin the failure site from the traceback (e.g. `"Crash on first-run setup wizard during model install (macOS, v1.26.0)"`).
Reply tone:
- Open with `Hi @user, thanks for filing this! 👋`
- State the diagnosis (what the log shows) before the asks.
- Use bullet lists with **bold labels** for asks. Keep to 3 to 5 asks max.
- Friendly emojis: 👋 🙏 🚀 🧠 🎤 🔊 📝.
- **No em dashes (—) anywhere in user-facing writing.** Use commas, full stops, colons, or parentheses.
- **British English** (colour, behaviour, initialise).
- Do not promise fixes or ETAs.
## Step 5. Post the reply
Issue comment:
```bash
gh issue comment <N> --body "..."
gh issue edit <N> --add-label "bug,question" --title "..."
gh issue close <N> --reason "not planned" # duplicates / wontfix only
```
Discussion comment (GraphQL, and **use `-f body=` not `-F body=`** if the body
starts with `@`, because `gh` treats `-F` values starting with `@` as file
paths):
```bash
gh api graphql -f query='mutation($id:ID!,$body:String!){addDiscussionComment(input:{discussionId:$id,body:$body}){comment{url}}}' \
-F id=<discussion node id> -f body="@user, ..."
```
Get the discussion `id` field from the Step 1 GraphQL output. It's the outer `id` on the discussion node, not the inner `id` inside `comments.nodes` (that one is the comment's node id, used in Step 6 for edits).
**Verify the node id before posting.** Discussion node ids look like `D_kwDOPgt_k84Albb5` and a single-character typo will silently route the comment to a completely unrelated repo's discussion (the prefix encodes the repo, but neighbouring ids belong to other repos). Two safeguards:
1. Copy the id straight from the Step 1 output, never retype it.
2. The mutation response returns the comment URL: `addDiscussionComment.comment.url`. Inspect it. If the host path is anything other than `github.com/isair/jarvis/discussions/<N>`, you posted to the wrong repo. Delete the comment immediately:
```bash
gh api graphql -f query='mutation($id:ID!){deleteDiscussionComment(input:{id:$id}){comment{id}}}' -F id=<comment node id>
```
Then repost with the correct discussion id.
To reply to a specific comment (threaded sub-reply) rather than at the top level, pass `replyToId` in the mutation input. Otherwise the reply goes to the root.
If a `body` you want to post starts with `@`, use `-f body="..."`, not `-F body="..."`. `gh` interprets `-F` values starting with `@` as file paths.
## Step 6. Clean up your own past comments
If a previous owner comment was premature, wrong, or asked an
obviously-answered question, **edit it in place**. A clean thread beats a trail
of self-corrections.
Issue comment edit:
```bash
gh api -X PATCH repos/isair/jarvis/issues/comments/<commentId> -f body="..."
```
Discussion comment edit. First grab the comment node id (the `last:5` window usually covers recent owner replies):
```bash
gh api graphql -f query='{repository(owner:"isair",name:"jarvis"){discussion(number:N){comments(last:5){nodes{id author{login} createdAt body}}}}}'
```
Then update it:
```bash
gh api graphql -f query='mutation($id:ID!,$body:String!){updateDiscussionComment(input:{commentId:$id,body:$body}){comment{url}}}' \
-F id=<comment node id> -f body="..."
```
## Step 7. Summarise to the user
At the end, list what you touched per thread: labels changed, titles changed,
comments posted, closures. Use markdown links like `[#241](https://github.com/isair/jarvis/issues/241)`. Keep it short.
## Hard rules
- Never close an issue because its fix landed on `develop`. Let the release auto-close.
- Never close for reporter silence under 2 weeks after a clarifying question.
- Never ask a question the log already answers.
- Never use em dashes in user-facing text.
- Never invent facts about a reporter's environment. Ask, or infer only from the log.
- When in doubt, label `question` and ask rather than guess.