Add Discord-native hybrid front-end for Jarvis (bot + bridge)
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Transform isair/jarvis into a Discord-controlled voice assistant running on the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact. - bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral), voice channel join + voice receive/playback, pluggable VNC screen broadcast (selfbot live / noVNC / screenshot) - bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS behind a thin localhost HTTP API - .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite, docs/language-comparison.md and docs/vnc-xfce-setup.md Language decision: hybrid (Python brain + Node/bun Discord layer) because Discord blocks bot video; native screen broadcast only works via a Node selfbot library.
This commit is contained in:
90
src/desktop_app/setup_wizard.spec.md
Normal file
90
src/desktop_app/setup_wizard.spec.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# Setup Wizard Specification
|
||||
|
||||
First-run wizard that ensures Ollama, required models, and Whisper are ready before Jarvis starts.
|
||||
|
||||
## Overview
|
||||
|
||||
The setup wizard is shown only when **user action is required** — it is not shown merely because the Ollama server isn't running (Jarvis can auto-start it). The two triggers are:
|
||||
|
||||
1. Ollama CLI is not installed.
|
||||
2. Ollama server is running but required models are missing.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Minimal friction**: Skip pages whose requirements are already met. Auto-detect as much as possible.
|
||||
2. **Guided, not blocking**: The wizard resolves prerequisites; it does not configure every setting. Fine-tuning happens in the Settings Window.
|
||||
3. **Platform-aware**: Apple Silicon gets MLX Whisper options. Windows gets hidden-console Ollama serve. macOS opens the Ollama app.
|
||||
4. **Safe re-entry**: Running the wizard again never destroys existing config — it only fills in missing values.
|
||||
|
||||
## Page Flow
|
||||
|
||||
```
|
||||
Welcome → [Ollama Install] → [Ollama Server] → Models → [Whisper] → Dictation → MCP Servers → Search Providers → [Location] → Complete
|
||||
```
|
||||
|
||||
Pages in brackets are conditional — skipped when their prerequisite is already satisfied.
|
||||
|
||||
### Pages
|
||||
|
||||
| # | Page | Condition to show | Config written |
|
||||
|---|------|-------------------|----------------|
|
||||
| 1 | **Welcome** | Always | — |
|
||||
| 2 | **Ollama Install** | CLI not found | — |
|
||||
| 3 | **Ollama Server** | Server not running | — |
|
||||
| 4 | **Models** | Always (user selects chat model) | `ollama_chat_model` |
|
||||
| 5 | **Whisper Setup** | Always (user selects Whisper model) | `whisper_model` |
|
||||
| 6 | **Dictation** | Always | `dictation_enabled`, `dictation_hotkey`, `dictation_filler_removal` |
|
||||
| 7 | **MCP Servers** | Always | `mcps` |
|
||||
| 8 | **Search Providers** | Always | `brave_search_api_key`, `wikipedia_fallback_enabled` |
|
||||
| 9 | **Location** | Location enabled but detection failing | `location_ip_address` |
|
||||
| 10 | **Complete** | Always | — |
|
||||
|
||||
### Page Details
|
||||
|
||||
**WelcomePage** — Status dashboard showing CLI, server, models, location, and MLX Whisper (Apple Silicon) readiness. Refresh button triggers a background `StatusCheckWorker`.
|
||||
|
||||
**OllamaInstallPage** — Platform-specific download instructions. Opens official download page. Verify button re-checks `check_ollama_cli()`.
|
||||
|
||||
**OllamaServerPage** — Start button auto-starts Ollama (macOS: `open -a Ollama`, Windows: hidden `ollama serve`, Linux: terminal `ollama serve`). Verify button re-checks `check_ollama_server()`.
|
||||
|
||||
**ModelsPage** — Displays `SUPPORTED_CHAT_MODELS` as selectable cards with VRAM requirements (including always-loaded intent judge overhead). Installs: selected chat model + embedding model (`nomic-embed-text`) + intent judge (`gemma4:e2b`). Progress bar and log output during `ollama pull`. User can skip if models are already present.
|
||||
|
||||
**WhisperSetupPage** — Language mode toggle (multilingual vs English-only), then model size selection from hardcoded options. Apple Silicon: additional FFmpeg and MLX Whisper installation buttons.
|
||||
|
||||
**DictationPage** — Enable/disable dictation, hotkey selection dropdown (4 presets), filler word removal toggle with delay warning. Reads current config values on open so re-running the wizard preserves user choices.
|
||||
|
||||
**MCPPage** — Shows wizard-featured entries from `mcp_catalogue.py` as selectable cards (checkbox + name + description). Already-configured servers start checked. On validate, selected servers are added to `config.mcps` and deselected wizard entries are removed. Includes a tip pointing users to Settings → MCP Servers for the full catalogue and custom servers.
|
||||
|
||||
**SearchProvidersPage** — Explains and configures the web-search fallback chain (DDG → Brave → Wikipedia → honest block). Always shown: the explainer is the point, not the configuration. Brave card takes an optional API key (password-masked) with a link to the Brave key portal. Wikipedia card is a toggle that defaults to on. Only non-default values are written to `config.json` (empty Brave key and enabled Wikipedia are both omitted), matching the settings window's minimal-diff invariant.
|
||||
|
||||
**LocationPage** — Tests location auto-detection. If it fails (private/CGNAT IP), offers manual IP input with OpenDNS resolution and GeoLite2 validation.
|
||||
|
||||
**CompletePage** — Success summary with tips. Hides Cancel button.
|
||||
|
||||
## Detection Functions
|
||||
|
||||
| Function | Returns | Purpose |
|
||||
|----------|---------|---------|
|
||||
| `should_show_setup_wizard()` | `bool` | Gate: only `True` when user action needed |
|
||||
| `check_ollama_cli()` | `(bool, path)` | CLI installed + path |
|
||||
| `check_ollama_server()` | `(bool, version)` | Server reachable + version |
|
||||
| `get_required_models()` | `list[str]` | Models needed per config |
|
||||
| `check_installed_models()` | `list[str]` | Models already pulled |
|
||||
| `check_ollama_status()` | `OllamaStatus` | Combined CLI + server + models |
|
||||
| `check_mlx_whisper_status()` | `MLXWhisperStatus` | Apple Silicon Whisper readiness |
|
||||
|
||||
## Threading
|
||||
|
||||
- `StatusCheckWorker(QThread)` — runs `check_ollama_status()` off the UI thread, emits result via signal.
|
||||
- `CommandWorker(QThread)` — runs shell commands (e.g. `ollama pull`), emits stdout line-by-line and completion status.
|
||||
|
||||
## Settings NOT Configured by Wizard
|
||||
|
||||
The wizard is deliberately limited to prerequisites. These are configured via the Settings Window:
|
||||
|
||||
- TTS settings (engine, voice, rate)
|
||||
- VAD / timing parameters
|
||||
- Wake word customisation
|
||||
- Dictation hotkey
|
||||
- Full MCP catalogue and custom MCP servers (wizard only shows featured entries)
|
||||
- All advanced parameters
|
||||
Reference in New Issue
Block a user