tkrmagid/javis_bot

Fork 0

Files

javis-bot c4abf63f38

Release / semantic-release (push) Successful in 59s

Details

tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s

Details

Release / build-linux (push) Failing after 7m47s

Details

Release / build-windows (push) Has been cancelled

Details

Release / build-macos (arm64, macos-latest) (push) Has been cancelled

Details

Release / build-macos (x64, macos-15-intel) (push) Has been cancelled

Details

Release / release-main (push) Has been cancelled

Details

Release / release-develop (push) Has been cancelled

Details

Add Discord-native hybrid front-end for Jarvis (bot + bridge)

Transform isair/jarvis into a Discord-controlled voice assistant running on
the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact.

- bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral),
  voice channel join + voice receive/playback, pluggable VNC screen broadcast
  (selfbot live / noVNC / screenshot)
- bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS
  behind a thin localhost HTTP API
- .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite,
  docs/language-comparison.md and docs/vnc-xfce-setup.md

Language decision: hybrid (Python brain + Node/bun Discord layer) because
Discord blocks bot video; native screen broadcast only works via a Node
selfbot library.

2026-06-09 14:51:05 +09:00

5.5 KiB

Raw Blame History

Setup Wizard Specification

First-run wizard that ensures Ollama, required models, and Whisper are ready before Jarvis starts.

Overview

The setup wizard is shown only when user action is required — it is not shown merely because the Ollama server isn't running (Jarvis can auto-start it). The two triggers are:

Ollama CLI is not installed.
Ollama server is running but required models are missing.

Design Principles

Minimal friction: Skip pages whose requirements are already met. Auto-detect as much as possible.
Guided, not blocking: The wizard resolves prerequisites; it does not configure every setting. Fine-tuning happens in the Settings Window.
Platform-aware: Apple Silicon gets MLX Whisper options. Windows gets hidden-console Ollama serve. macOS opens the Ollama app.
Safe re-entry: Running the wizard again never destroys existing config — it only fills in missing values.

Page Flow

Welcome → [Ollama Install] → [Ollama Server] → Models → [Whisper] → Dictation → MCP Servers → Search Providers → [Location] → Complete

Pages in brackets are conditional — skipped when their prerequisite is already satisfied.

Pages

#	Page	Condition to show	Config written
1	Welcome	Always	—
2	Ollama Install	CLI not found	—
3	Ollama Server	Server not running	—
4	Models	Always (user selects chat model)	`ollama_chat_model`
5	Whisper Setup	Always (user selects Whisper model)	`whisper_model`
6	Dictation	Always	`dictation_enabled`, `dictation_hotkey`, `dictation_filler_removal`
7	MCP Servers	Always	`mcps`
8	Search Providers	Always	`brave_search_api_key`, `wikipedia_fallback_enabled`
9	Location	Location enabled but detection failing	`location_ip_address`
10	Complete	Always	—

Page Details

WelcomePage — Status dashboard showing CLI, server, models, location, and MLX Whisper (Apple Silicon) readiness. Refresh button triggers a background StatusCheckWorker.

OllamaInstallPage — Platform-specific download instructions. Opens official download page. Verify button re-checks check_ollama_cli().

OllamaServerPage — Start button auto-starts Ollama (macOS: open -a Ollama, Windows: hidden ollama serve, Linux: terminal ollama serve). Verify button re-checks check_ollama_server().

ModelsPage — Displays SUPPORTED_CHAT_MODELS as selectable cards with VRAM requirements (including always-loaded intent judge overhead). Installs: selected chat model + embedding model (nomic-embed-text) + intent judge (gemma4:e2b). Progress bar and log output during ollama pull. User can skip if models are already present.

WhisperSetupPage — Language mode toggle (multilingual vs English-only), then model size selection from hardcoded options. Apple Silicon: additional FFmpeg and MLX Whisper installation buttons.

DictationPage — Enable/disable dictation, hotkey selection dropdown (4 presets), filler word removal toggle with delay warning. Reads current config values on open so re-running the wizard preserves user choices.

MCPPage — Shows wizard-featured entries from mcp_catalogue.py as selectable cards (checkbox + name + description). Already-configured servers start checked. On validate, selected servers are added to config.mcps and deselected wizard entries are removed. Includes a tip pointing users to Settings → MCP Servers for the full catalogue and custom servers.

SearchProvidersPage — Explains and configures the web-search fallback chain (DDG → Brave → Wikipedia → honest block). Always shown: the explainer is the point, not the configuration. Brave card takes an optional API key (password-masked) with a link to the Brave key portal. Wikipedia card is a toggle that defaults to on. Only non-default values are written to config.json (empty Brave key and enabled Wikipedia are both omitted), matching the settings window's minimal-diff invariant.

LocationPage — Tests location auto-detection. If it fails (private/CGNAT IP), offers manual IP input with OpenDNS resolution and GeoLite2 validation.

CompletePage — Success summary with tips. Hides Cancel button.

Detection Functions

Function	Returns	Purpose
`should_show_setup_wizard()`	`bool`	Gate: only `True` when user action needed
`check_ollama_cli()`	`(bool, path)`	CLI installed + path
`check_ollama_server()`	`(bool, version)`	Server reachable + version
`get_required_models()`	`list[str]`	Models needed per config
`check_installed_models()`	`list[str]`	Models already pulled
`check_ollama_status()`	`OllamaStatus`	Combined CLI + server + models
`check_mlx_whisper_status()`	`MLXWhisperStatus`	Apple Silicon Whisper readiness

Threading

StatusCheckWorker(QThread) — runs check_ollama_status() off the UI thread, emits result via signal.
CommandWorker(QThread) — runs shell commands (e.g. ollama pull), emits stdout line-by-line and completion status.

Settings NOT Configured by Wizard

The wizard is deliberately limited to prerequisites. These are configured via the Settings Window:

TTS settings (engine, voice, rate)
VAD / timing parameters
Wake word customisation
Dictation hotkey
Full MCP catalogue and custom MCP servers (wizard only shows featured entries)
All advanced parameters

5.5 KiB Raw Blame History

Setup Wizard Specification

Overview

Design Principles

Page Flow

Pages

Page Details

Detection Functions

Threading

Settings NOT Configured by Wizard

5.5 KiB

Raw Blame History