Add Discord-native hybrid front-end for Jarvis (bot + bridge)
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled

Transform isair/jarvis into a Discord-controlled voice assistant running on
the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact.

- bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral),
  voice channel join + voice receive/playback, pluggable VNC screen broadcast
  (selfbot live / noVNC / screenshot)
- bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS
  behind a thin localhost HTTP API
- .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite,
  docs/language-comparison.md and docs/vnc-xfce-setup.md

Language decision: hybrid (Python brain + Node/bun Discord layer) because
Discord blocks bot video; native screen broadcast only works via a Node
selfbot library.
This commit is contained in:
javis-bot
2026-06-09 14:51:05 +09:00
parent a5bf8d1826
commit c4abf63f38
308 changed files with 94135 additions and 1 deletions

67
.env.example Normal file
View File

@@ -0,0 +1,67 @@
# ============================================================================
# Javis Bot — environment configuration
# Copy to `.env` and fill in. Never commit your real `.env`.
# ============================================================================
# ---------------------------------------------------------------------------
# Discord bot (normal bot account) — voice I/O + slash commands
# ---------------------------------------------------------------------------
# From https://discord.com/developers/applications → your app
DISCORD_BOT_TOKEN=
DISCORD_APP_ID=
# The (single) server this bot serves. Guild-scoped commands appear instantly.
DISCORD_GUILD_ID=
# ---------------------------------------------------------------------------
# Brain bridge (Python service in bridge/) — STT + reply engine + TTS
# ---------------------------------------------------------------------------
BRIDGE_URL=http://127.0.0.1:8765
BRIDGE_HOST=127.0.0.1
BRIDGE_PORT=8765
JARVIS_BRAIN_ENABLED=1
JARVIS_TTS_ENABLED=1
# faster-whisper device/compute. On this RTX 5050 box: cuda / float16.
WHISPER_DEVICE=auto
WHISPER_COMPUTE_TYPE=auto
# Optional explicit Piper voice model (.onnx). If empty, the jarvis default is used.
TTS_PIPER_MODEL_PATH=
# ---------------------------------------------------------------------------
# Jarvis brain (Ollama-backed). See src/jarvis/config.py for the full list.
# ---------------------------------------------------------------------------
OLLAMA_BASE_URL=http://127.0.0.1:11434
# OLLAMA_CHAT_MODEL=...
# WHISPER_MODEL=...
# ---------------------------------------------------------------------------
# VNC screen broadcast
# selfbot = real live "Go Live" stream (needs a USER/burner token; ToS risk)
# novnc = share a noVNC browser link (safe, real-time, not native)
# screenshot = periodic screenshots to the channel (safe, low fps)
# none = disabled
# ---------------------------------------------------------------------------
STREAM_BACKEND=selfbot
# The VNC desktop runs on X display :1 (see docs/vnc-xfce-setup.md)
VNC_DISPLAY=:1
VNC_RESOLUTION=1920x1080
VNC_FRAMERATE=30
VNC_BITRATE_KBPS=4000
# --- selfbot backend ---
# A THROWAWAY/burner Discord user account token. NEVER your main account.
# Using a selfbot violates Discord ToS and can get the account banned.
DISCORD_SELFBOT_TOKEN=
# --- novnc backend ---
# e.g. http://192.168.10.9:6080/vnc.html (websockify --web=/usr/share/novnc 6080 localhost:5901)
NOVNC_URL=
# --- screenshot backend ---
SCREENSHOT_INTERVAL_SEC=5
# ---------------------------------------------------------------------------
# Voice behaviour
# ---------------------------------------------------------------------------
# Silence (ms) that marks the end of an utterance before sending to the brain.
VOICE_SILENCE_MS=800