Commit Graph

15 Commits

Author SHA1 Message Date
javis-bot
ef6f6ff57d feat(stream): STREAM_BROWSER flag + make toolbar-hide/subtitles broadcast-wide
- Add STREAM_BROWSER (.env) gating screen-share/browser mode. false => the
  /자비스 stream command stays voice + API/MCP only (no Go-Live); true (default)
  => screen share as before. (Browser-driven info retrieval in true mode is a
  follow-up build; the bot has no browser-control tools yet.)
- Make the two test-time fixes broadcast-wide defaults via broadcast-helper.mjs:
  it now also watches every tab for HTML5 fullscreen and toggles Chrome window
  fullscreen so the address bar is hidden for ANY video (xfwm4 won't hide it on
  'f' alone), restoring on exit. Subtitles were already enforced per video.
  scenario.mjs drops its own fullscreen toggle and relies on the helper.
- Revert the test-settings env vars from .env.example (not wanted).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-10 16:17:29 +09:00
javis-bot
208fbbc851 feat(selfbot): broadcast desktop audio + smart subtitles in the browse scenario
Two broadcast-experience improvements:

- Audio: the Go-Live stream was video-only. Capture the desktop sound (the
  default PipeWire/Pulse sink monitor, @DEFAULT_MONITOR@) as a second ffmpeg
  input and mux AAC into the mpegts; the library re-encodes it to Opus for
  Discord. Controlled by STREAM_AUDIO / STREAM_AUDIO_SOURCE (default on). ffmpeg
  inherits XDG_RUNTIME_DIR to reach the pulse socket. Verified: the streamer now
  reports "Found audio stream" and the monitor carries Chrome audio (~-11 dB).
- Subtitles: in the browse scenario, default captions OFF, but auto-enable a
  Korean track when the video offers one (getOption captions tracklist ->
  setOption / unloadModule).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-10 15:50:32 +09:00
javis-bot
4176a68873 fix(selfbot): smooth VNC capture via keepalive + stop ffmpeg leak on stream end
The Go-Live broadcast looked badly choppy: video and scrolling stuttered while
the cursor stayed smooth. Root cause is TigerVNC: it only refreshes its
framebuffer while a VNC client is attached, but the broadcast reads that
framebuffer with x11grab (not as a VNC client). With no viewer attached the
captured screen idled at ~1.5 fps (measured 3/30 distinct frames); the cursor
looked smooth only because x11grab overlays the live cursor on every frame.

- Add a headless RFB keepalive (vnc-keepalive.ts) that stays connected for the
  life of the stream and requests incremental framebuffer updates at the stream
  framerate. SelfbotStreamer starts it on broadcast start and tears it down on
  stop/self-end. Measured 3/30 -> 57/60 distinct frames at 60 fps. Fail-open;
  authenticates with VNC_PASSWORD or the ~/.config/tigervnc/passwd file.
- Fix a resource leak: when the Go-Live ended on its own, only the active flag
  was cleared, leaving the x11grab->nvenc ffmpeg running forever (pinning a CPU
  core while no media was transmitted, with only the gateway TCP left and no UDP
  media). The self-end path now tears down capture, keepalive and voice like
  stop() does.
- Tests for both paths (self-end teardown; keepalive DES auth, port mapping,
  password resolution). Add @types/bun so bun:test typechecks; document the
  keepalive and recommended Chrome flags in README and .env.example.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-10 15:21:44 +09:00
javis-bot
1e30a49562 fix: cap selfbot stream -maxrate at lib's 10 Mbps ceiling; add stream-test tooling
- selfbot.ts: the @dank074 lib advertises a hardcoded max_bitrate of 10 Mbps to
  Discord (BaseMediaConnection: `max_bitrate: 10000 * 1000`). Our encoder used
  -maxrate = 1.5x target (12 Mbps at 8 Mbps target), so high-motion bursts
  exceeded the negotiated ceiling and WebRTC dropped packets (viewer stutter).
  Cap -maxrate at 10 Mbps.
- Add bot/scripts/stream-test/: env-driven stream-hold.ts (persistent Go-Live
  holder), human.mjs (real xdotool mouse/keyboard + char-by-char typing), and
  scenario.mjs (YouTube/Naver browse). Channel/guild/video are env-parametrised.
- .env.example: document DISCORD_VOICE_CHANNEL_ID for the stream-test scripts.
2026-06-10 12:50:24 +09:00
javis-bot
7a148f8caa fix: don't unlock active in startup catch when a newer attempt owns it
The startup catch cleared this.active unconditionally. In a stop()+restart
race during the slow login/pauses, the first attempt's catch would fire after
the second start() had already taken the lock, unlocking it mid-startup and
letting a third start() race in. Guard the active/state reset with
`this.controller === controller`, matching the field-null and playStream
.finally guards.

Verified live: stop during login then restart keeps the restart's lock
(active stays true), and it clears to false only once truly stopped; no crash.
2026-06-10 12:10:01 +09:00
javis-bot
2fd5e0fe9e chore: lengthen humanised selfbot startup delays
Join/go-live still felt a touch fast. Widen the pauses: ~2.5-4.5s after
coming online before joining voice, ~6-10s after joining before Go Live.
2026-06-10 11:47:31 +09:00
javis-bot
2c7f0a95b5 fix: make humanised selfbot startup abort- and concurrency-safe
The human-pause delays leave start() in-flight for several seconds, which
exposed two races:
- stop() during a pause only ended the pause; start() continued and called
  joinVoice on the streamer stop() had already nulled (null deref).
- `active` was set only just before go-live, so a second /stream during the
  delay passed the guard and both calls raced on the same overwritten streamer.

Now start() locks `active` before any await, keeps controller/streamer/capture
as local refs, and calls signal.throwIfAborted() after each await so an
interleaved stop() unwinds into a catch that tears down via the local refs and
clears instance state only if it still points at this attempt. isActive() now
reflects "starting" during the delay too.

Verified live: concurrent start is rejected ("이미 송출 중입니다"), stop() mid-
startup returns a cancel message with isActive=false and no uncaught error, and
the happy path still goes live and tears down cleanly. tsc --noEmit passes.
2026-06-10 11:42:57 +09:00
javis-bot
b6cf05f6cf feat: humanise selfbot voice-join and go-live pacing
Joining voice and starting the broadcast instantly looks like a bot. Add
randomised, human-plausible pauses (~0.9-2.2s after coming online before
joining the channel, ~2.5-5s after joining before hitting Go Live) so the
cadence isn't machine-instant or fingerprintable. The pause resolves
immediately on stop() so teardown never hangs mid-wait.

Verified live: end-to-end join -> settle -> Go Live took ~8s before the
stream went live, held for 15s, and tore down cleanly. tsc --noEmit passes.
2026-06-10 11:37:39 +09:00
javis-bot
40fd7dbb59 fix: single-pass NVENC encode for selfbot stream (no double encode)
Address review: the capture ffmpeg had no -b:v, so it encoded at nvenc's
low default (~2.47 Mbps) and the library then re-encoded to 8 Mbps, which
only upscaled already-lost detail. The double encode also kept CPU decode
+ scale + re-encode in the library, contradicting the "GPU handles it"
claim.

Now the system ffmpeg produces the final Discord-ready H264 in one pass
(-b:v/-maxrate at the configured bitrate, -bf 0, 1s keyframes, yuv420p,
-forced-idr) and prepareStream uses noTranscoding:true to remux only. One
GPU encode, no library decode/scale/re-encode.

Verified locally: high-motion source fills 8.7 Mbps at these args (vs the
~2.47 Mbps no-bitrate default), real :1 desktop holds 60fps at realtime,
and the capture -> copy/remux chain yields h264 1920x1080 yuv420p 60fps
has_b_frames=0. tsc --noEmit passes. Live Discord test pending reboot.
2026-06-10 11:23:52 +09:00
javis-bot
ad0caa8142 feat: 1080p60 NVENC selfbot broadcast (8 Mbps default)
Bump the default broadcast to 1080p 60fps at 8 Mbps and route both encode
stages through the GPU (RTX 5050, h264_nvenc) so 60fps stays smooth without
loading the 4-core host.

- selfbot.ts: capture ffmpeg uses h264_nvenc when streamHw is on (falls back
  to software x264 otherwise), and prepareStream now passes Encoders.nvenc()
  so the library's transcode runs on the GPU too. Guard loadLib for Encoders.
- config.ts: VNC_FRAMERATE default 30 -> 60, VNC_BITRATE_KBPS 4000 -> 8000.
- .env.example: document the new 1080p60/8 Mbps defaults and STREAM_HW.

Verified locally: h264_nvenc x11grab holds a steady 60fps with headroom,
Encoders.nvenc() returns valid h264_nvenc settings, and tsc --noEmit passes.
Live Discord voice-channel verification pending a host reboot.
2026-06-10 11:17:44 +09:00
javis-bot
5137fdeaf7 selfbot streaming: verified live; capture via system ffmpeg x11grab
Some checks failed
Release / build-windows (push) Blocked by required conditions
Release / build-macos (arm64, macos-latest) (push) Blocked by required conditions
Release / build-macos (x64, macos-15-intel) (push) Blocked by required conditions
Release / release-main (push) Blocked by required conditions
Release / release-develop (push) Blocked by required conditions
Release / semantic-release (push) Successful in 24s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 10m1s
Release / build-linux (push) Failing after 7m35s
End-to-end verified with a real burner token + voice channel: login OK, posts
to the text channel, joins voice, and Go-Live streams the host :1 desktop.

- selfbot.ts now captures the X display with the SYSTEM ffmpeg (reliable
  x11grab) and pipes it into prepareStream, instead of relying on the lib's
  bundled libav input devices (not portable). Capture process is killed on stop.
- package.json: trustedDependencies (node-av, @lng2004/node-datachannel) so the
  native streaming deps build automatically on bun install (incl. Docker).
- Dropped the unused nvenc path (the lib's exported `nvenc` is undefined at
  runtime); software H264 encode for now.
2026-06-10 10:38:28 +09:00
javis-bot
7aac92fc2c token helper: render auth link as a scannable QR PNG
Some checks failed
Release / semantic-release (push) Successful in 26s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s
Release / build-linux (push) Failing after 7m13s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
get-token.ts now writes the Remote Auth URL as a 512x512 QR image
(/tmp/javis_qr.png, override via QR_OUT) in addition to printing the link, so
it can be sent to the user and scanned from a second screen with the Discord
mobile app. Adds the qrcode dependency.
2026-06-09 21:03:31 +09:00
javis-bot
f80a6fa0ba Add remote-auth token helper (get selfbot token via a link, no devtools)
Some checks failed
Release / semantic-release (push) Successful in 31s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s
Release / build-linux (push) Failing after 7m14s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
bot/src/get-token.ts uses discord.js-selfbot-v13 DiscordAuthWebsocket: it
prints the Discord Remote Auth URL (https://discord.com/ra/<code> — the same
thing a login QR encodes). Open it on a phone with the Discord app, approve the
"New login" prompt, and the user token is written to .env as
DISCORD_SELFBOT_TOKEN. Works from a single mobile device (no second screen, no
password, no browser devtools). `bun run token`.
2026-06-09 20:42:24 +09:00
javis-bot
b56c9c7721 Address remaining review items (queue, selfbot v6 API, ldconfig, resample)
Some checks failed
Release / semantic-release (push) Successful in 22s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m56s
Release / build-linux (push) Failing after 7m15s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
- voice.ts: reply playback is now a FIFO queue (AudioPlayerStatus.Idle drains
  it) so concurrent speakers no longer cut each other's replies off.
- selfbot.ts: rewritten against the REAL @dank074/discord-video-stream v6 API
  (verified from its d.ts): prepareStream(input, opts, signal)->{command,output},
  playStream(output, streamer, {type:"go-live"}, signal), Streamer.joinVoice.
  x11grab via customInputOptions; optional NVENC encode (RTX 5050) via exported
  `nvenc`. package.json pinned to ^6.0.0 (was a wrong ^4.2.1).
- Dockerfile: dropped the hardcoded python3.12 LD_LIBRARY_PATH. faster-whisper
  >=1.1 self-locates the pip CUDA libs; ldconfig (full path, glob) registers
  them as a robust fallback. Verified: ld.so cache lists libcublas/libcudnn and
  GPU whisper works with LD_LIBRARY_PATH empty.
- bridge: STT resample 48k->16k upgraded from nearest-neighbor to linear
  (np.interp).

Verified: tsc clean, image builds, GPU whisper OK via ldconfig, compose valid.
2026-06-09 18:47:25 +09:00
javis-bot
c4abf63f38 Add Discord-native hybrid front-end for Jarvis (bot + bridge)
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Transform isair/jarvis into a Discord-controlled voice assistant running on
the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact.

- bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral),
  voice channel join + voice receive/playback, pluggable VNC screen broadcast
  (selfbot live / noVNC / screenshot)
- bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS
  behind a thin localhost HTTP API
- .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite,
  docs/language-comparison.md and docs/vnc-xfce-setup.md

Language decision: hybrid (Python brain + Node/bun Discord layer) because
Discord blocks bot video; native screen broadcast only works via a Node
selfbot library.
2026-06-09 14:51:05 +09:00