Dockerize: one-command stack with auto Ollama model pull

`docker compose up -d --build` now brings up the whole thing automatically — no host setup needed: - All-in-one javis image: TigerVNC+XFCE desktop, Chrome, Python brain bridge, Node/bun bot, managed by supervisord (verified: all 6 programs RUNNING). - ollama service + one-shot ollama-init that auto-pulls chat+embed models (verified end-to-end; `ollama list` shows pulled models). - Discord token deferred: without DISCORD_BOT_TOKEN the desktop, bridge, Ollama and models all run; only the bot waits (no crash loop). - Slim container deps (bridge/requirements-bridge.txt) drop the unused PyQt6/torch/chatterbox/sounddevice stack. Piper voice + Whisper models auto-download into named volumes. - Configurable host ports (VNC_PORT/NOVNC_PORT/BRIDGE_PORT) to avoid clashing with a host VNC already on 5901. Bridge binds 0.0.0.0 in-container. Verified: image builds; brain imports; bridge /health 200; noVNC 200; X display :1 @1920x1080; auto-pull completes; supervisorctl status all RUNNING.
2026-06-09 15:27:41 +09:00
parent c4abf63f38
commit 25c77ac794
14 changed files with 448 additions and 4 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,18 @@
+.git
+.github
+**/node_modules
+bot/node_modules
+.venv
+**/__pycache__
+**/*.pyc
+.pytest_cache
+*.db
+*.sqlite
+.env
+.env.local
+release_output.log
+build
+dist
+tests
+evals
+docs/img
--- a/.env.example
+++ b/.env.example
@@ -27,11 +27,22 @@ WHISPER_COMPUTE_TYPE=auto
 TTS_PIPER_MODEL_PATH=

 # ---------------------------------------------------------------------------
-# Jarvis brain (Ollama-backed). See src/jarvis/config.py for the full list.
+# Jarvis brain (Ollama-backed). In Docker these populate the rendered
+# config (docker/jarvis-config.template.json). See src/jarvis/config.py.
 # ---------------------------------------------------------------------------
+# In docker-compose this is overridden to http://ollama:11434 automatically.
 OLLAMA_BASE_URL=http://127.0.0.1:11434
-# OLLAMA_CHAT_MODEL=...
-# WHISPER_MODEL=...
+OLLAMA_CHAT_MODEL=llama3.1:8b
+OLLAMA_EMBED_MODEL=nomic-embed-text
+WHISPER_MODEL=small
+
+# ---------------------------------------------------------------------------
+# Docker desktop (VNC) — used only by the container image
+# ---------------------------------------------------------------------------
+# VNC viewer password (max 8 chars effective). Watch the screen at localhost:5901.
+VNC_PASSWORD=javis123
+# Auto-opened page in the in-container Chrome.
+CHROME_START_URL=about:blank

 # ---------------------------------------------------------------------------
 # VNC screen broadcast
--- a/51
+++ b/51
@@ -0,0 +1,51 @@
+# ============================================================================
+# Javis Bot — all-in-one container
+# VNC + XFCE desktop + Chrome + Python brain bridge + Node/bun Discord bot.
+# Ollama (the LLM backend) runs as a separate service (see docker-compose.yml).
+# ============================================================================
+FROM ubuntu:24.04
+
+ENV DEBIAN_FRONTEND=noninteractive \
+    LANG=C.UTF-8 \
+    DISPLAY=:1 \
+    PATH=/opt/venv/bin:/root/.bun/bin:/usr/local/bin:/usr/bin:/bin
+
+# --- System packages: desktop, VNC, Chrome deps, ffmpeg, python, ocr ---
+RUN apt-get update && apt-get install -y --no-install-recommends \
+      ca-certificates curl wget gnupg unzip procps \
+      tigervnc-standalone-server tigervnc-common tigervnc-tools \
+      xfce4 xfce4-goodies dbus-x11 x11-utils xfonts-base \
+      fonts-noto-cjk fonts-noto-cjk-extra fonts-nanum \
+      ffmpeg tesseract-ocr \
+      python3 python3-venv python3-pip \
+      novnc websockify supervisor gettext-base \
+    && rm -rf /var/lib/apt/lists/*
+
+# --- Google Chrome (stable) ---
+RUN wget -q -O /tmp/chrome.deb https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
+    && (apt-get update && apt-get install -y --no-install-recommends /tmp/chrome.deb || (apt-get -f install -y)) \
+    && rm -f /tmp/chrome.deb && rm -rf /var/lib/apt/lists/*
+
+# --- bun (Discord bot runtime/package manager) ---
+RUN curl -fsSL https://bun.sh/install | bash
+
+# --- Python brain/bridge deps (slim set) ---
+COPY bridge/requirements-bridge.txt /app/bridge/requirements-bridge.txt
+RUN python3 -m venv /opt/venv \
+    && /opt/venv/bin/pip install --no-cache-dir --upgrade pip \
+    && /opt/venv/bin/pip install --no-cache-dir -r /app/bridge/requirements-bridge.txt
+
+# --- Discord bot deps (cache layer on lockfile) ---
+COPY bot/package.json bot/bun.lock /app/bot/
+RUN cd /app/bot && bun install --frozen-lockfile || bun install
+
+# --- App source ---
+COPY . /app
+WORKDIR /app
+
+# --- Default Piper voice (best-effort at build; entrypoint retries if absent) ---
+RUN bash docker/download-piper.sh || true
+
+EXPOSE 5901 6080 8765
+
+ENTRYPOINT ["/app/docker/entrypoint.sh"]
--- a/README.md
+++ b/README.md
@@ -47,7 +47,41 @@ Discord  ──voice / video / slash──▶  bot/      (Node + bun, discord.js

 ---

-## 설치 & 실행
+## 실행 — Docker (권장)
+
+환경 설정 없이 통째로 컨테이너에서 돌립니다. VNC 데스크톱 + 크롬 + Python 브릿지 + Node 봇이 한 컨테이너(`javis`)에, LLM 백엔드(Ollama)가 별도 컨테이너에 뜹니다. **올리기만 하면 Ollama 모델까지 자동으로** 받아집니다.
+
+```bash
+# 빌드 & 기동 — 이게 전부입니다.
+docker compose up -d --build
+```
+
+`docker compose up` 한 번이면 자동으로:
+- Ollama 서버가 뜨고, `ollama-init`이 채팅/임베딩 모델을 **자동 pull**
+- VNC+XFCE 데스크톱 + 크롬 + Python 브릿지가 기동
+- Whisper STT 모델 / Piper TTS 음성 자동 다운로드(볼륨에 캐시)
+
+화면 보기: VNC 뷰어 → `localhost:5901` (비밀번호 = `.env`의 `VNC_PASSWORD`, 기본 `javis123`) 또는 브라우저 → `http://localhost:6080/vnc.html`.
+로그: `docker compose logs -f javis`.
+
+### 디스코드 토큰은 마지막에
+
+토큰 없이도 위의 모든 게 정상 동작합니다(봇만 대기). 준비되면:
+
+```bash
+cp .env.example .env          # DISCORD_BOT_TOKEN / DISCORD_APP_ID / DISCORD_GUILD_ID 채우기
+docker compose up -d          # 봇이 시작되고 /자비스 명령 등록
+```
+
+디스코드에서 `/자비스 join` 으로 호출하세요. (`OLLAMA_CHAT_MODEL` 등 모델을 바꾸려면 `.env`에서 지정 후 `docker compose up -d`.)
+
+- GPU(RTX 5050) 가속: 호스트에 nvidia-container-toolkit 설치 후 `docker-compose.yml`의 GPU 블록 주석 해제, `.env`에서 `WHISPER_DEVICE=cuda` / `WHISPER_COMPUTE_TYPE=float16`.
+- 데이터(메모리 DB), Whisper 캐시, Piper 음성은 named volume에 영속됩니다.
+- 셀프봇 영상 송출 의존성은 이미지에 기본 포함하지 않습니다. 쓰려면 컨테이너에서 `cd /app/bot && bun add discord.js-selfbot-v13 @dank074/discord-video-stream` 후 재시작(또는 Dockerfile에 추가).
+
+---
+
+## 실행 — 수동(도커 없이)

 ```bash
 # 1) 환경 변수
--- a/bridge/requirements-bridge.txt
+++ b/bridge/requirements-bridge.txt
@@ -0,0 +1,27 @@
+# Slim dependency set for the containerized brain bridge.
+# Excludes the upstream desktop GUI / dictation / packaging / alternate-TTS
+# stack (PyQt6, pyinstaller, sounddevice, webrtcvad, pynput, pygame,
+# chatterbox-tts/torch, mlx) which are unused in the Discord+VNC deployment.
+
+# --- Brain runtime (imported when the reply engine loads) ---
+python-dotenv==1.0.1
+faster-whisper==1.0.3
+mcp==1.13.1
+numpy<2.0.0
+rapidfuzz==3.6.1
+requests==2.32.3
+
+# --- Bridge HTTP service ---
+flask>=3.0.0
+
+# --- Text-to-speech (Piper) ---
+piper-tts>=1.3.0
+
+# --- Built-in tools (lazily imported; needed for full functionality) ---
+beautifulsoup4>=4.12.0
+lxml>=4.9.0
+html2text>=2020.1.16
+geoip2==4.8.0
+Pillow==10.4.0
+pytesseract==0.3.13
+faiss-cpu>=1.7.4
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,84 @@
+# ============================================================================
+# Javis Bot — Docker Compose
+#   ollama      : the LLM backend for the jarvis brain
+#   ollama-init : one-shot, auto-pulls the chat + embed models on startup
+#   javis       : all-in-one container (VNC desktop + Chrome + bridge + bot)
+#
+# Just bring it up — everything (incl. Ollama models) comes up automatically:
+#   docker compose up -d --build
+#
+# The Discord token can be added LAST: without it the desktop, brain bridge,
+# Ollama and models all run; only the bot waits. Then put DISCORD_BOT_TOKEN in
+# .env and re-run `docker compose up -d`.
+#
+# Watch the desktop:  VNC viewer -> localhost:5901  (or browser -> localhost:6080)
+# ============================================================================
+services:
+  ollama:
+    image: ollama/ollama:latest
+    restart: unless-stopped
+    volumes:
+      - ollama_models:/root/.ollama
+    # --- GPU (optional): needs nvidia-container-toolkit on the host ---
+    # deploy:
+    #   resources:
+    #     reservations:
+    #       devices:
+    #         - driver: nvidia
+    #           count: all
+    #           capabilities: [gpu]
+
+  # Auto-pull the models the brain needs, then exit. Idempotent (re-runnable).
+  ollama-init:
+    image: ollama/ollama:latest
+    depends_on:
+      - ollama
+    restart: "no"
+    environment:
+      OLLAMA_HOST: http://ollama:11434
+      CHAT_MODEL: ${OLLAMA_CHAT_MODEL:-llama3.1:8b}
+      EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-nomic-embed-text}
+    entrypoint: ["/bin/sh", "-c"]
+    command:
+      - |
+        echo "[ollama-init] waiting for ollama server...";
+        until ollama list >/dev/null 2>&1; do sleep 2; done;
+        echo "[ollama-init] pulling $$CHAT_MODEL";
+        ollama pull "$$CHAT_MODEL";
+        echo "[ollama-init] pulling $$EMBED_MODEL";
+        ollama pull "$$EMBED_MODEL";
+        echo "[ollama-init] models ready.";
+
+  javis:
+    build: .
+    restart: unless-stopped
+    env_file:
+      - path: .env
+        required: false
+    environment:
+      # Point the brain at the ollama service and the bot at the in-container bridge.
+      OLLAMA_BASE_URL: http://ollama:11434
+      OLLAMA_CHAT_MODEL: ${OLLAMA_CHAT_MODEL:-llama3.1:8b}
+      OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-nomic-embed-text}
+      WHISPER_MODEL: ${WHISPER_MODEL:-small}
+      BRIDGE_URL: http://127.0.0.1:8765
+    depends_on:
+      - ollama
+    shm_size: "1gb"          # Chrome needs a larger /dev/shm
+    ports:
+      # Host ports are overridable. If the HOST already runs VNC on 5901
+      # (see docs/vnc-xfce-setup.md), set VNC_PORT=5902 in .env.
+      - "${VNC_PORT:-5901}:5901"      # VNC
+      - "${NOVNC_PORT:-6080}:6080"    # noVNC (open in a browser)
+      - "${BRIDGE_PORT:-8765}:8765"   # brain bridge (usually internal-only)
+    volumes:
+      - javis_data:/data                         # jarvis db + memory
+      - whisper_cache:/root/.cache/huggingface   # cached Whisper models
+      - piper_voices:/opt/piper-voices           # TTS voices
+    # --- GPU (optional): mirror the ollama GPU block above to accelerate Whisper ---
+
+volumes:
+  ollama_models:
+  javis_data:
+  whisper_cache:
+  piper_voices:
--- a/docker/download-piper.sh
+++ b/docker/download-piper.sh
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+# Download the default Piper voice model if it is not already present.
+# Used both at image build time and (as a fallback) at container start.
+set -euo pipefail
+
+VOICE="${PIPER_VOICE:-en_GB-alan-medium}"
+DEST_DIR="${PIPER_VOICE_DIR:-/opt/piper-voices}"
+BASE="https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0"
+
+# en_GB-alan-medium -> en/en_GB/alan/medium
+lang2="${VOICE%%-*}"          # en_GB
+lang1="${lang2%%_*}"          # en
+rest="${VOICE#*-}"            # alan-medium
+name="${rest%%-*}"            # alan
+quality="${rest#*-}"          # medium
+path="${lang1}/${lang2}/${name}/${quality}"
+
+mkdir -p "$DEST_DIR"
+onnx="$DEST_DIR/${VOICE}.onnx"
+json="$DEST_DIR/${VOICE}.onnx.json"
+
+if [ -f "$onnx" ] && [ -f "$json" ]; then
+  echo "[piper] voice already present: $onnx"
+  exit 0
+fi
+
+echo "[piper] downloading voice $VOICE ..."
+wget -q -O "$onnx" "${BASE}/${path}/${VOICE}.onnx"
+wget -q -O "$json" "${BASE}/${path}/${VOICE}.onnx.json"
+echo "[piper] saved to $onnx"
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -0,0 +1,42 @@
+#!/usr/bin/env bash
+# Container entrypoint: render config from env, set the VNC password, ensure the
+# Piper voice exists, then hand off to supervisord (which runs the desktop,
+# bridge, and bot).
+set -euo pipefail
+
+# --- Defaults (override via .env / compose) ---
+: "${VNC_PASSWORD:=javis123}"
+: "${VNC_RESOLUTION:=1920x1080}"
+: "${OLLAMA_BASE_URL:=http://ollama:11434}"
+: "${OLLAMA_CHAT_MODEL:=llama3.1:8b}"
+: "${OLLAMA_EMBED_MODEL:=nomic-embed-text}"
+: "${WHISPER_MODEL:=small}"
+: "${WHISPER_DEVICE:=cpu}"
+: "${WHISPER_COMPUTE_TYPE:=int8}"
+: "${JARVIS_DB_PATH:=/data/jarvis.db}"
+: "${BRIDGE_HOST:=0.0.0.0}"
+: "${BRIDGE_PORT:=8765}"
+: "${PIPER_VOICE:=en_GB-alan-medium}"
+: "${PIPER_VOICE_DIR:=/opt/piper-voices}"
+: "${TTS_PIPER_MODEL_PATH:=${PIPER_VOICE_DIR}/${PIPER_VOICE}.onnx}"
+
+export VNC_RESOLUTION OLLAMA_BASE_URL OLLAMA_CHAT_MODEL OLLAMA_EMBED_MODEL \
+       WHISPER_MODEL WHISPER_DEVICE WHISPER_COMPUTE_TYPE JARVIS_DB_PATH \
+       PIPER_VOICE PIPER_VOICE_DIR TTS_PIPER_MODEL_PATH BRIDGE_HOST BRIDGE_PORT
+
+mkdir -p /data /app/config "$(dirname "$JARVIS_DB_PATH")"
+
+# --- VNC password file ---
+mkdir -p /root/.vnc
+echo "$VNC_PASSWORD" | tigervncpasswd -f > /root/.vnc/passwd
+chmod 600 /root/.vnc/passwd
+
+# --- Render jarvis brain config from template ---
+envsubst < /app/docker/jarvis-config.template.json > /app/config/jarvis.json
+export JARVIS_CONFIG_PATH=/app/config/jarvis.json
+
+# --- Ensure the Piper voice exists (best effort) ---
+bash /app/docker/download-piper.sh || echo "[entrypoint] piper download failed; TTS may be unavailable"
+
+echo "[entrypoint] display=$DISPLAY ollama=$OLLAMA_BASE_URL whisper=$WHISPER_MODEL/$WHISPER_DEVICE"
+exec supervisord -c /app/docker/supervisord.conf
--- a/docker/jarvis-config.template.json
+++ b/docker/jarvis-config.template.json
@@ -0,0 +1,18 @@
+{
+  "db_path": "${JARVIS_DB_PATH}",
+  "sqlite_vss_path": null,
+  "ollama_base_url": "${OLLAMA_BASE_URL}",
+  "ollama_embed_model": "${OLLAMA_EMBED_MODEL}",
+  "ollama_chat_model": "${OLLAMA_CHAT_MODEL}",
+  "tts_enabled": true,
+  "tts_engine": "piper",
+  "tts_piper_model_path": "${TTS_PIPER_MODEL_PATH}",
+  "whisper_model": "${WHISPER_MODEL}",
+  "whisper_backend": "faster-whisper",
+  "whisper_device": "${WHISPER_DEVICE}",
+  "whisper_compute_type": "${WHISPER_COMPUTE_TYPE}",
+  "location_enabled": true,
+  "web_search_enabled": true,
+  "wikipedia_fallback_enabled": true,
+  "mcps": {}
+}
--- a/docker/run-bot.sh
+++ b/docker/run-bot.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+# Wait for the brain bridge, then run the Discord bot.
+#
+# The Discord token is intentionally deferred: if DISCORD_BOT_TOKEN is not set
+# yet, the rest of the stack (desktop, bridge, ollama) still runs fully. The bot
+# just waits. Add the token to .env and `docker compose up -d` to start it.
+set -e
+cd /app/bot
+
+if [ -z "${DISCORD_BOT_TOKEN:-}" ]; then
+  echo "[bot] DISCORD_BOT_TOKEN 미설정 — 봇 대기 중. .env에 토큰을 넣고 'docker compose up -d' 하면 시작됩니다."
+  echo "[bot] (그동안 VNC 데스크톱 / 브릿지 / Ollama 는 정상 동작합니다.)"
+  exec sleep infinity
+fi
+
+BRIDGE="${BRIDGE_URL:-http://127.0.0.1:8765}"
+for i in $(seq 1 60); do
+  curl -fsS "$BRIDGE/health" >/dev/null 2>&1 && break
+  sleep 1
+done
+bun run register || echo "[bot] slash command registration failed (continuing)"
+exec bun run start
--- a/docker/run-chrome.sh
+++ b/docker/run-chrome.sh
@@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+# Wait for the desktop, then launch Chrome on :1 so the VNC screen shows a
+# controllable browser (jarvis can also drive it). Runs as root -> --no-sandbox.
+set -e
+for i in $(seq 1 40); do
+  xdpyinfo -display :1 >/dev/null 2>&1 && break
+  sleep 1
+done
+sleep 3
+export DISPLAY=:1
+exec google-chrome \
+  --no-sandbox --no-first-run --disable-dev-shm-usage \
+  --password-store=basic --start-maximized \
+  "${CHROME_START_URL:-about:blank}"
--- a/docker/run-xfce.sh
+++ b/docker/run-xfce.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+# Wait for the X server, then start the XFCE session (with a dbus session).
+set -e
+for i in $(seq 1 30); do
+  xdpyinfo -display :1 >/dev/null 2>&1 && break
+  sleep 1
+done
+export DISPLAY=:1
+export XDG_DATA_DIRS=/usr/local/share:/usr/share
+export XDG_CONFIG_DIRS=/etc/xdg
+# startxfce4 bails when X is already up; call the session directly.
+exec dbus-launch --exit-with-session xfce4-session
--- a/docker/run-xvnc.sh
+++ b/docker/run-xvnc.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+# Start the TigerVNC X server on display :1.
+# NOTE: do NOT pass `-extension RENDER` — it blanks XFCE menus/panels
+# (see docs/vnc-xfce-setup.md §3-4).
+set -e
+: "${VNC_RESOLUTION:=1920x1080}"
+exec /usr/bin/Xvnc :1 \
+  -geometry "$VNC_RESOLUTION" -depth 24 \
+  -rfbport 5901 -rfbauth /root/.vnc/passwd \
+  -SecurityTypes VncAuth -localhost no -AlwaysShared
--- a/docker/supervisord.conf
+++ b/docker/supervisord.conf
@@ -0,0 +1,71 @@
+[supervisord]
+nodaemon=true
+user=root
+logfile=/var/log/supervisord.log
+pidfile=/run/supervisord.pid
+
+[unix_http_server]
+file=/run/supervisor.sock
+
+[supervisorctl]
+serverurl=unix:///run/supervisor.sock
+
+[rpcinterface:supervisor]
+supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
+
+[program:xvnc]
+command=/app/docker/run-xvnc.sh
+priority=100
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0
+
+[program:xfce]
+command=/app/docker/run-xfce.sh
+priority=200
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0
+
+[program:novnc]
+command=websockify --web=/usr/share/novnc 6080 localhost:5901
+priority=250
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0
+
+[program:bridge]
+command=/opt/venv/bin/python -m bridge.server
+directory=/app
+priority=300
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0
+
+[program:chrome]
+command=/app/docker/run-chrome.sh
+priority=350
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0
+
+[program:bot]
+command=/app/docker/run-bot.sh
+directory=/app/bot
+priority=400
+autorestart=true
+startretries=999
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0