Commit Graph

11 Commits

Author SHA1 Message Date
javis-bot
40fd7dbb59 fix: single-pass NVENC encode for selfbot stream (no double encode)
Address review: the capture ffmpeg had no -b:v, so it encoded at nvenc's
low default (~2.47 Mbps) and the library then re-encoded to 8 Mbps, which
only upscaled already-lost detail. The double encode also kept CPU decode
+ scale + re-encode in the library, contradicting the "GPU handles it"
claim.

Now the system ffmpeg produces the final Discord-ready H264 in one pass
(-b:v/-maxrate at the configured bitrate, -bf 0, 1s keyframes, yuv420p,
-forced-idr) and prepareStream uses noTranscoding:true to remux only. One
GPU encode, no library decode/scale/re-encode.

Verified locally: high-motion source fills 8.7 Mbps at these args (vs the
~2.47 Mbps no-bitrate default), real :1 desktop holds 60fps at realtime,
and the capture -> copy/remux chain yields h264 1920x1080 yuv420p 60fps
has_b_frames=0. tsc --noEmit passes. Live Discord test pending reboot.
2026-06-10 11:23:52 +09:00
javis-bot
ad0caa8142 feat: 1080p60 NVENC selfbot broadcast (8 Mbps default)
Bump the default broadcast to 1080p 60fps at 8 Mbps and route both encode
stages through the GPU (RTX 5050, h264_nvenc) so 60fps stays smooth without
loading the 4-core host.

- selfbot.ts: capture ffmpeg uses h264_nvenc when streamHw is on (falls back
  to software x264 otherwise), and prepareStream now passes Encoders.nvenc()
  so the library's transcode runs on the GPU too. Guard loadLib for Encoders.
- config.ts: VNC_FRAMERATE default 30 -> 60, VNC_BITRATE_KBPS 4000 -> 8000.
- .env.example: document the new 1080p60/8 Mbps defaults and STREAM_HW.

Verified locally: h264_nvenc x11grab holds a steady 60fps with headroom,
Encoders.nvenc() returns valid h264_nvenc settings, and tsc --noEmit passes.
Live Discord voice-channel verification pending a host reboot.
2026-06-10 11:17:44 +09:00
javis-bot
5137fdeaf7 selfbot streaming: verified live; capture via system ffmpeg x11grab
Some checks failed
Release / build-windows (push) Blocked by required conditions
Release / build-macos (arm64, macos-latest) (push) Blocked by required conditions
Release / build-macos (x64, macos-15-intel) (push) Blocked by required conditions
Release / release-main (push) Blocked by required conditions
Release / release-develop (push) Blocked by required conditions
Release / semantic-release (push) Successful in 24s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 10m1s
Release / build-linux (push) Failing after 7m35s
End-to-end verified with a real burner token + voice channel: login OK, posts
to the text channel, joins voice, and Go-Live streams the host :1 desktop.

- selfbot.ts now captures the X display with the SYSTEM ffmpeg (reliable
  x11grab) and pipes it into prepareStream, instead of relying on the lib's
  bundled libav input devices (not portable). Capture process is killed on stop.
- package.json: trustedDependencies (node-av, @lng2004/node-datachannel) so the
  native streaming deps build automatically on bun install (incl. Docker).
- Dropped the unused nvenc path (the lib's exported `nvenc` is undefined at
  runtime); software H264 encode for now.
2026-06-10 10:38:28 +09:00
javis-bot
7aac92fc2c token helper: render auth link as a scannable QR PNG
Some checks failed
Release / semantic-release (push) Successful in 26s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s
Release / build-linux (push) Failing after 7m13s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
get-token.ts now writes the Remote Auth URL as a 512x512 QR image
(/tmp/javis_qr.png, override via QR_OUT) in addition to printing the link, so
it can be sent to the user and scanned from a second screen with the Discord
mobile app. Adds the qrcode dependency.
2026-06-09 21:03:31 +09:00
javis-bot
f80a6fa0ba Add remote-auth token helper (get selfbot token via a link, no devtools)
Some checks failed
Release / semantic-release (push) Successful in 31s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s
Release / build-linux (push) Failing after 7m14s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
bot/src/get-token.ts uses discord.js-selfbot-v13 DiscordAuthWebsocket: it
prints the Discord Remote Auth URL (https://discord.com/ra/<code> — the same
thing a login QR encodes). Open it on a phone with the Discord app, approve the
"New login" prompt, and the user token is written to .env as
DISCORD_SELFBOT_TOKEN. Works from a single mobile device (no second screen, no
password, no browser devtools). `bun run token`.
2026-06-09 20:42:24 +09:00
javis-bot
b56c9c7721 Address remaining review items (queue, selfbot v6 API, ldconfig, resample)
Some checks failed
Release / semantic-release (push) Successful in 22s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m56s
Release / build-linux (push) Failing after 7m15s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
- voice.ts: reply playback is now a FIFO queue (AudioPlayerStatus.Idle drains
  it) so concurrent speakers no longer cut each other's replies off.
- selfbot.ts: rewritten against the REAL @dank074/discord-video-stream v6 API
  (verified from its d.ts): prepareStream(input, opts, signal)->{command,output},
  playStream(output, streamer, {type:"go-live"}, signal), Streamer.joinVoice.
  x11grab via customInputOptions; optional NVENC encode (RTX 5050) via exported
  `nvenc`. package.json pinned to ^6.0.0 (was a wrong ^4.2.1).
- Dockerfile: dropped the hardcoded python3.12 LD_LIBRARY_PATH. faster-whisper
  >=1.1 self-locates the pip CUDA libs; ldconfig (full path, glob) registers
  them as a robust fallback. Verified: ld.so cache lists libcublas/libcudnn and
  GPU whisper works with LD_LIBRARY_PATH empty.
- bridge: STT resample 48k->16k upgraded from nearest-neighbor to linear
  (np.interp).

Verified: tsc clean, image builds, GPU whisper OK via ldconfig, compose valid.
2026-06-09 18:47:25 +09:00
javis-bot
964123682f Review fixes: correct Piper TTS API + bot env gating
Some checks failed
Release / semantic-release (push) Successful in 21s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m53s
Release / build-linux (push) Failing after 7m12s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Code review of the bridge/bot/docker work found:
- TTS bug: bridge called PiperVoice.synthesize(text, wav) but that method
  returns AudioChunks and takes a SynthesisConfig as its 2nd arg, not a wav
  file -> TTS would fail. Switched to synthesize_wav(text, wav_file).
  Verified: produces a valid 22050Hz mono WAV.
- run-bot.sh now waits if ANY of DISCORD_BOT_TOKEN/APP_ID/GUILD_ID is missing
  (config.ts throws on a missing one), preventing a supervisor crash-loop.

Verified clean: discord.js Events.ClientReady == 'clientReady' (existing
handler correct); image rebuilds.
2026-06-09 16:16:55 +09:00
javis-bot
0dbc0300d7 Enable GPU: LLM + Whisper on the RTX 5050, pick qwen3:8b
Some checks failed
Release / semantic-release (push) Successful in 19s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s
Release / build-linux (push) Failing after 7m14s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
GPU acceleration is now on by default and verified end-to-end on the
Blackwell RTX 5050 (sm_120):

- Ollama offloads 100% to GPU (log: library=CUDA compute=12.0,
  BLACKWELL_NATIVE_FP4=1). compose passes GPU via CDI
  (devices: nvidia.com/gpu=all) to both ollama and javis.
- Whisper STT on GPU: faster-whisper>=1.1.0 + nvidia-cublas/cudnn cu12,
  LD_LIBRARY_PATH baked into the image. Verified float16 transcribe on
  sm_120; bridge auto-falls back to CPU when no GPU is present.
- Model: default chat model -> qwen3:8b (best 8GB-VRAM tool-calling,
  ~5GB Q4). Embed stays nomic-embed-text.
- README documents the host one-time setup (nvidia-container-toolkit +
  `nvidia-ctk cdi generate`) and GPU on/off.

Verified: image builds; GPU visible in both containers via compose;
ollama ps = 100% GPU; faster-whisper cuda OK + CPU fallback OK;
bridge /health 200.
2026-06-09 15:49:21 +09:00
javis-bot
25c77ac794 Dockerize: one-command stack with auto Ollama model pull
Some checks failed
Release / semantic-release (push) Successful in 22s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m55s
Release / build-linux (push) Failing after 7m36s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
`docker compose up -d --build` now brings up the whole thing automatically —
no host setup needed:

- All-in-one javis image: TigerVNC+XFCE desktop, Chrome, Python brain bridge,
  Node/bun bot, managed by supervisord (verified: all 6 programs RUNNING).
- ollama service + one-shot ollama-init that auto-pulls chat+embed models
  (verified end-to-end; `ollama list` shows pulled models).
- Discord token deferred: without DISCORD_BOT_TOKEN the desktop, bridge,
  Ollama and models all run; only the bot waits (no crash loop).
- Slim container deps (bridge/requirements-bridge.txt) drop the unused
  PyQt6/torch/chatterbox/sounddevice stack. Piper voice + Whisper models
  auto-download into named volumes.
- Configurable host ports (VNC_PORT/NOVNC_PORT/BRIDGE_PORT) to avoid clashing
  with a host VNC already on 5901. Bridge binds 0.0.0.0 in-container.

Verified: image builds; brain imports; bridge /health 200; noVNC 200;
X display :1 @1920x1080; auto-pull completes; supervisorctl status all RUNNING.
2026-06-09 15:27:41 +09:00
javis-bot
c4abf63f38 Add Discord-native hybrid front-end for Jarvis (bot + bridge)
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Transform isair/jarvis into a Discord-controlled voice assistant running on
the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact.

- bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral),
  voice channel join + voice receive/playback, pluggable VNC screen broadcast
  (selfbot live / noVNC / screenshot)
- bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS
  behind a thin localhost HTTP API
- .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite,
  docs/language-comparison.md and docs/vnc-xfce-setup.md

Language decision: hybrid (Python brain + Node/bun Discord layer) because
Discord blocks bot video; native screen broadcast only works via a Node
selfbot library.
2026-06-09 14:51:05 +09:00
a5bf8d1826 Initial commit 2026-06-09 13:58:41 +09:00