End-to-end verified with a real burner token + voice channel: login OK, posts
to the text channel, joins voice, and Go-Live streams the host :1 desktop.
- selfbot.ts now captures the X display with the SYSTEM ffmpeg (reliable
x11grab) and pipes it into prepareStream, instead of relying on the lib's
bundled libav input devices (not portable). Capture process is killed on stop.
- package.json: trustedDependencies (node-av, @lng2004/node-datachannel) so the
native streaming deps build automatically on bun install (incl. Docker).
- Dropped the unused nvenc path (the lib's exported `nvenc` is undefined at
runtime); software H264 encode for now.
get-token.ts now writes the Remote Auth URL as a 512x512 QR image
(/tmp/javis_qr.png, override via QR_OUT) in addition to printing the link, so
it can be sent to the user and scanned from a second screen with the Discord
mobile app. Adds the qrcode dependency.
bot/src/get-token.ts uses discord.js-selfbot-v13 DiscordAuthWebsocket: it
prints the Discord Remote Auth URL (https://discord.com/ra/<code> — the same
thing a login QR encodes). Open it on a phone with the Discord app, approve the
"New login" prompt, and the user token is written to .env as
DISCORD_SELFBOT_TOKEN. Works from a single mobile device (no second screen, no
password, no browser devtools). `bun run token`.
- voice.ts: reply playback is now a FIFO queue (AudioPlayerStatus.Idle drains
it) so concurrent speakers no longer cut each other's replies off.
- selfbot.ts: rewritten against the REAL @dank074/discord-video-stream v6 API
(verified from its d.ts): prepareStream(input, opts, signal)->{command,output},
playStream(output, streamer, {type:"go-live"}, signal), Streamer.joinVoice.
x11grab via customInputOptions; optional NVENC encode (RTX 5050) via exported
`nvenc`. package.json pinned to ^6.0.0 (was a wrong ^4.2.1).
- Dockerfile: dropped the hardcoded python3.12 LD_LIBRARY_PATH. faster-whisper
>=1.1 self-locates the pip CUDA libs; ldconfig (full path, glob) registers
them as a robust fallback. Verified: ld.so cache lists libcublas/libcudnn and
GPU whisper works with LD_LIBRARY_PATH empty.
- bridge: STT resample 48k->16k upgraded from nearest-neighbor to linear
(np.interp).
Verified: tsc clean, image builds, GPU whisper OK via ldconfig, compose valid.
Code review of the bridge/bot/docker work found:
- TTS bug: bridge called PiperVoice.synthesize(text, wav) but that method
returns AudioChunks and takes a SynthesisConfig as its 2nd arg, not a wav
file -> TTS would fail. Switched to synthesize_wav(text, wav_file).
Verified: produces a valid 22050Hz mono WAV.
- run-bot.sh now waits if ANY of DISCORD_BOT_TOKEN/APP_ID/GUILD_ID is missing
(config.ts throws on a missing one), preventing a supervisor crash-loop.
Verified clean: discord.js Events.ClientReady == 'clientReady' (existing
handler correct); image rebuilds.
GPU acceleration is now on by default and verified end-to-end on the
Blackwell RTX 5050 (sm_120):
- Ollama offloads 100% to GPU (log: library=CUDA compute=12.0,
BLACKWELL_NATIVE_FP4=1). compose passes GPU via CDI
(devices: nvidia.com/gpu=all) to both ollama and javis.
- Whisper STT on GPU: faster-whisper>=1.1.0 + nvidia-cublas/cudnn cu12,
LD_LIBRARY_PATH baked into the image. Verified float16 transcribe on
sm_120; bridge auto-falls back to CPU when no GPU is present.
- Model: default chat model -> qwen3:8b (best 8GB-VRAM tool-calling,
~5GB Q4). Embed stays nomic-embed-text.
- README documents the host one-time setup (nvidia-container-toolkit +
`nvidia-ctk cdi generate`) and GPU on/off.
Verified: image builds; GPU visible in both containers via compose;
ollama ps = 100% GPU; faster-whisper cuda OK + CPU fallback OK;
bridge /health 200.
`docker compose up -d --build` now brings up the whole thing automatically —
no host setup needed:
- All-in-one javis image: TigerVNC+XFCE desktop, Chrome, Python brain bridge,
Node/bun bot, managed by supervisord (verified: all 6 programs RUNNING).
- ollama service + one-shot ollama-init that auto-pulls chat+embed models
(verified end-to-end; `ollama list` shows pulled models).
- Discord token deferred: without DISCORD_BOT_TOKEN the desktop, bridge,
Ollama and models all run; only the bot waits (no crash loop).
- Slim container deps (bridge/requirements-bridge.txt) drop the unused
PyQt6/torch/chatterbox/sounddevice stack. Piper voice + Whisper models
auto-download into named volumes.
- Configurable host ports (VNC_PORT/NOVNC_PORT/BRIDGE_PORT) to avoid clashing
with a host VNC already on 5901. Bridge binds 0.0.0.0 in-container.
Verified: image builds; brain imports; bridge /health 200; noVNC 200;
X display :1 @1920x1080; auto-pull completes; supervisorctl status all RUNNING.