javis_bot

Author	SHA1	Message	Date
javis-bot	e154404baf	feat(stream-test): persistent YouTube ad auto-skipper for the broadcast Adds ad-skip.mjs: connects over CDP and injects a watcher into every tab (current and future) that clicks "Skip ad" the moment it appears, closes overlay ads, and fast-forwards unskippable ads (seek-to-end + 16x + mute) so they clear in ~1s. Self-contained (no extension, no hosts/network changes) and reconnects across Chrome restarts. Documented in the README. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-10 15:54:17 +09:00
javis-bot	208fbbc851	feat(selfbot): broadcast desktop audio + smart subtitles in the browse scenario Two broadcast-experience improvements: - Audio: the Go-Live stream was video-only. Capture the desktop sound (the default PipeWire/Pulse sink monitor, @DEFAULT_MONITOR@) as a second ffmpeg input and mux AAC into the mpegts; the library re-encodes it to Opus for Discord. Controlled by STREAM_AUDIO / STREAM_AUDIO_SOURCE (default on). ffmpeg inherits XDG_RUNTIME_DIR to reach the pulse socket. Verified: the streamer now reports "Found audio stream" and the monitor carries Chrome audio (~-11 dB). - Subtitles: in the browse scenario, default captions OFF, but auto-enable a Korean track when the video offers one (getOption captions tracklist -> setOption / unloadModule). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-10 15:50:32 +09:00
javis-bot	c6a0ca4572	fix(stream-test): hide Chrome toolbar in fullscreen so the address bar stays off the broadcast On the streamed VNC desktop (xfwm4), Chrome did not hide its toolbar when a video entered HTML5 fullscreen via 'f' - the window was full-screen (outerHeight 1080) but the tab/address bar stayed, leaving only 988px of content, so the address bar bled into the Go-Live broadcast. Toggle Chrome-initiated browser fullscreen via CDP (Browser.setWindowBounds windowState fullscreen) around the 'f' step. That reliably hides the toolbar (innerHeight 1080 vs 988); the toolbar is restored on exit, so normal browsing still shows it. Verified live: clean full-screen video, no toolbar. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-10 15:39:08 +09:00
javis-bot	4176a68873	fix(selfbot): smooth VNC capture via keepalive + stop ffmpeg leak on stream end The Go-Live broadcast looked badly choppy: video and scrolling stuttered while the cursor stayed smooth. Root cause is TigerVNC: it only refreshes its framebuffer while a VNC client is attached, but the broadcast reads that framebuffer with x11grab (not as a VNC client). With no viewer attached the captured screen idled at ~1.5 fps (measured 3/30 distinct frames); the cursor looked smooth only because x11grab overlays the live cursor on every frame. - Add a headless RFB keepalive (vnc-keepalive.ts) that stays connected for the life of the stream and requests incremental framebuffer updates at the stream framerate. SelfbotStreamer starts it on broadcast start and tears it down on stop/self-end. Measured 3/30 -> 57/60 distinct frames at 60 fps. Fail-open; authenticates with VNC_PASSWORD or the ~/.config/tigervnc/passwd file. - Fix a resource leak: when the Go-Live ended on its own, only the active flag was cleared, leaving the x11grab->nvenc ffmpeg running forever (pinning a CPU core while no media was transmitted, with only the gateway TCP left and no UDP media). The self-end path now tears down capture, keepalive and voice like stop() does. - Tests for both paths (self-end teardown; keepalive DES auth, port mapping, password resolution). Add @types/bun so bun:test typechecks; document the keepalive and recommended Chrome flags in README and .env.example. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-10 15:21:44 +09:00
javis-bot	8709f40fd6	fix(stream-test): refuse final box when element stays off-screen bringIntoView returned the last boundingBox() unconditionally after the scroll loop exhausted, so an element still outside the viewport would be clicked anyway. Validate the final box against the actual viewport bounds on both axes (innerWidth/innerHeight) and return null otherwise, so humanClick fails instead of clicking an off-screen coordinate.	2026-06-10 14:18:43 +09:00
javis-bot	bbc2fa3f7a	refactor(stream-test): real-wheel into view, no synthetic-click fallback Address review accuracy: humanClick used DOM scrollIntoViewIfNeeded and fell back to Playwright locator.click() when an element had no box - neither is real input. Now it brings elements into view with a real wheel scroll and throws if there is no on-screen box (no synthetic click). Header comment and README corrected: xdotool injects synthetic X input (not a physical HID device), and all actions are real input while the CDP/DOM API is used only to read state.	2026-06-10 14:15:26 +09:00
javis-bot	2cdd159fc1	feat(stream-test): drive the whole browse scenario with real input Make every action real keyboard/mouse via xdotool, not just the visible browsing: address-bar navigation (Ctrl+L + char-by-char typing), the YouTube settings gear -> 화질 -> 1080p menu (real clicks, verified hd1080), the autoplay toggle, the play button, and fullscreen via the real 'f' key (F11 isn't honored by this WM; 'f' yields true 1080p fullscreen without pausing). CDP/DOM API is now used only to read state for verification.	2026-06-10 14:11:58 +09:00
javis-bot	1e30a49562	fix: cap selfbot stream -maxrate at lib's 10 Mbps ceiling; add stream-test tooling - selfbot.ts: the @dank074 lib advertises a hardcoded max_bitrate of 10 Mbps to Discord (BaseMediaConnection: `max_bitrate: 10000 * 1000`). Our encoder used -maxrate = 1.5x target (12 Mbps at 8 Mbps target), so high-motion bursts exceeded the negotiated ceiling and WebRTC dropped packets (viewer stutter). Cap -maxrate at 10 Mbps. - Add bot/scripts/stream-test/: env-driven stream-hold.ts (persistent Go-Live holder), human.mjs (real xdotool mouse/keyboard + char-by-char typing), and scenario.mjs (YouTube/Naver browse). Channel/guild/video are env-parametrised. - .env.example: document DISCORD_VOICE_CHANNEL_ID for the stream-test scripts.	2026-06-10 12:50:24 +09:00
javis-bot	7a148f8caa	fix: don't unlock active in startup catch when a newer attempt owns it The startup catch cleared this.active unconditionally. In a stop()+restart race during the slow login/pauses, the first attempt's catch would fire after the second start() had already taken the lock, unlocking it mid-startup and letting a third start() race in. Guard the active/state reset with `this.controller === controller`, matching the field-null and playStream .finally guards. Verified live: stop during login then restart keeps the restart's lock (active stays true), and it clears to false only once truly stopped; no crash.	2026-06-10 12:10:01 +09:00
javis-bot	2fd5e0fe9e	chore: lengthen humanised selfbot startup delays Join/go-live still felt a touch fast. Widen the pauses: ~2.5-4.5s after coming online before joining voice, ~6-10s after joining before Go Live.	2026-06-10 11:47:31 +09:00
javis-bot	2c7f0a95b5	fix: make humanised selfbot startup abort- and concurrency-safe The human-pause delays leave start() in-flight for several seconds, which exposed two races: - stop() during a pause only ended the pause; start() continued and called joinVoice on the streamer stop() had already nulled (null deref). - `active` was set only just before go-live, so a second /stream during the delay passed the guard and both calls raced on the same overwritten streamer. Now start() locks `active` before any await, keeps controller/streamer/capture as local refs, and calls signal.throwIfAborted() after each await so an interleaved stop() unwinds into a catch that tears down via the local refs and clears instance state only if it still points at this attempt. isActive() now reflects "starting" during the delay too. Verified live: concurrent start is rejected ("이미 송출 중입니다"), stop() mid- startup returns a cancel message with isActive=false and no uncaught error, and the happy path still goes live and tears down cleanly. tsc --noEmit passes.	2026-06-10 11:42:57 +09:00
javis-bot	b6cf05f6cf	feat: humanise selfbot voice-join and go-live pacing Joining voice and starting the broadcast instantly looks like a bot. Add randomised, human-plausible pauses (~0.9-2.2s after coming online before joining the channel, ~2.5-5s after joining before hitting Go Live) so the cadence isn't machine-instant or fingerprintable. The pause resolves immediately on stop() so teardown never hangs mid-wait. Verified live: end-to-end join -> settle -> Go Live took ~8s before the stream went live, held for 15s, and tore down cleanly. tsc --noEmit passes.	2026-06-10 11:37:39 +09:00
javis-bot	40fd7dbb59	fix: single-pass NVENC encode for selfbot stream (no double encode) Address review: the capture ffmpeg had no -b:v, so it encoded at nvenc's low default (~2.47 Mbps) and the library then re-encoded to 8 Mbps, which only upscaled already-lost detail. The double encode also kept CPU decode + scale + re-encode in the library, contradicting the "GPU handles it" claim. Now the system ffmpeg produces the final Discord-ready H264 in one pass (-b:v/-maxrate at the configured bitrate, -bf 0, 1s keyframes, yuv420p, -forced-idr) and prepareStream uses noTranscoding:true to remux only. One GPU encode, no library decode/scale/re-encode. Verified locally: high-motion source fills 8.7 Mbps at these args (vs the ~2.47 Mbps no-bitrate default), real :1 desktop holds 60fps at realtime, and the capture -> copy/remux chain yields h264 1920x1080 yuv420p 60fps has_b_frames=0. tsc --noEmit passes. Live Discord test pending reboot.	2026-06-10 11:23:52 +09:00
javis-bot	ad0caa8142	feat: 1080p60 NVENC selfbot broadcast (8 Mbps default) Bump the default broadcast to 1080p 60fps at 8 Mbps and route both encode stages through the GPU (RTX 5050, h264_nvenc) so 60fps stays smooth without loading the 4-core host. - selfbot.ts: capture ffmpeg uses h264_nvenc when streamHw is on (falls back to software x264 otherwise), and prepareStream now passes Encoders.nvenc() so the library's transcode runs on the GPU too. Guard loadLib for Encoders. - config.ts: VNC_FRAMERATE default 30 -> 60, VNC_BITRATE_KBPS 4000 -> 8000. - .env.example: document the new 1080p60/8 Mbps defaults and STREAM_HW. Verified locally: h264_nvenc x11grab holds a steady 60fps with headroom, Encoders.nvenc() returns valid h264_nvenc settings, and tsc --noEmit passes. Live Discord voice-channel verification pending a host reboot.	2026-06-10 11:17:44 +09:00
javis-bot	5137fdeaf7	selfbot streaming: verified live; capture via system ffmpeg x11grab Some checks failed Release / build-windows (push) Blocked by required conditions Details Release / build-macos (arm64, macos-latest) (push) Blocked by required conditions Details Release / build-macos (x64, macos-15-intel) (push) Blocked by required conditions Details Release / release-main (push) Blocked by required conditions Details Release / release-develop (push) Blocked by required conditions Details Release / semantic-release (push) Successful in 24s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 10m1s Details Release / build-linux (push) Failing after 7m35s Details End-to-end verified with a real burner token + voice channel: login OK, posts to the text channel, joins voice, and Go-Live streams the host :1 desktop. - selfbot.ts now captures the X display with the SYSTEM ffmpeg (reliable x11grab) and pipes it into prepareStream, instead of relying on the lib's bundled libav input devices (not portable). Capture process is killed on stop. - package.json: trustedDependencies (node-av, @lng2004/node-datachannel) so the native streaming deps build automatically on bun install (incl. Docker). - Dropped the unused nvenc path (the lib's exported `nvenc` is undefined at runtime); software H264 encode for now.	2026-06-10 10:38:28 +09:00
javis-bot	7aac92fc2c	token helper: render auth link as a scannable QR PNG Some checks failed Release / semantic-release (push) Successful in 26s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s Details Release / build-linux (push) Failing after 7m13s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details get-token.ts now writes the Remote Auth URL as a 512x512 QR image (/tmp/javis_qr.png, override via QR_OUT) in addition to printing the link, so it can be sent to the user and scanned from a second screen with the Discord mobile app. Adds the qrcode dependency.	2026-06-09 21:03:31 +09:00
javis-bot	f80a6fa0ba	Add remote-auth token helper (get selfbot token via a link, no devtools) Some checks failed Release / semantic-release (push) Successful in 31s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s Details Release / build-linux (push) Failing after 7m14s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details bot/src/get-token.ts uses discord.js-selfbot-v13 DiscordAuthWebsocket: it prints the Discord Remote Auth URL (https://discord.com/ra/<code> — the same thing a login QR encodes). Open it on a phone with the Discord app, approve the "New login" prompt, and the user token is written to .env as DISCORD_SELFBOT_TOKEN. Works from a single mobile device (no second screen, no password, no browser devtools). `bun run token`.	2026-06-09 20:42:24 +09:00
javis-bot	b56c9c7721	Address remaining review items (queue, selfbot v6 API, ldconfig, resample) Some checks failed Release / semantic-release (push) Successful in 22s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m56s Details Release / build-linux (push) Failing after 7m15s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details - voice.ts: reply playback is now a FIFO queue (AudioPlayerStatus.Idle drains it) so concurrent speakers no longer cut each other's replies off. - selfbot.ts: rewritten against the REAL @dank074/discord-video-stream v6 API (verified from its d.ts): prepareStream(input, opts, signal)->{command,output}, playStream(output, streamer, {type:"go-live"}, signal), Streamer.joinVoice. x11grab via customInputOptions; optional NVENC encode (RTX 5050) via exported `nvenc`. package.json pinned to ^6.0.0 (was a wrong ^4.2.1). - Dockerfile: dropped the hardcoded python3.12 LD_LIBRARY_PATH. faster-whisper >=1.1 self-locates the pip CUDA libs; ldconfig (full path, glob) registers them as a robust fallback. Verified: ld.so cache lists libcublas/libcudnn and GPU whisper works with LD_LIBRARY_PATH empty. - bridge: STT resample 48k->16k upgraded from nearest-neighbor to linear (np.interp). Verified: tsc clean, image builds, GPU whisper OK via ldconfig, compose valid.	2026-06-09 18:47:25 +09:00
javis-bot	964123682f	Review fixes: correct Piper TTS API + bot env gating Some checks failed Release / semantic-release (push) Successful in 21s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m53s Details Release / build-linux (push) Failing after 7m12s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details Code review of the bridge/bot/docker work found: - TTS bug: bridge called PiperVoice.synthesize(text, wav) but that method returns AudioChunks and takes a SynthesisConfig as its 2nd arg, not a wav file -> TTS would fail. Switched to synthesize_wav(text, wav_file). Verified: produces a valid 22050Hz mono WAV. - run-bot.sh now waits if ANY of DISCORD_BOT_TOKEN/APP_ID/GUILD_ID is missing (config.ts throws on a missing one), preventing a supervisor crash-loop. Verified clean: discord.js Events.ClientReady == 'clientReady' (existing handler correct); image rebuilds.	2026-06-09 16:16:55 +09:00
javis-bot	0dbc0300d7	Enable GPU: LLM + Whisper on the RTX 5050, pick qwen3:8b Some checks failed Release / semantic-release (push) Successful in 19s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s Details Release / build-linux (push) Failing after 7m14s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details GPU acceleration is now on by default and verified end-to-end on the Blackwell RTX 5050 (sm_120): - Ollama offloads 100% to GPU (log: library=CUDA compute=12.0, BLACKWELL_NATIVE_FP4=1). compose passes GPU via CDI (devices: nvidia.com/gpu=all) to both ollama and javis. - Whisper STT on GPU: faster-whisper>=1.1.0 + nvidia-cublas/cudnn cu12, LD_LIBRARY_PATH baked into the image. Verified float16 transcribe on sm_120; bridge auto-falls back to CPU when no GPU is present. - Model: default chat model -> qwen3:8b (best 8GB-VRAM tool-calling, ~5GB Q4). Embed stays nomic-embed-text. - README documents the host one-time setup (nvidia-container-toolkit + `nvidia-ctk cdi generate`) and GPU on/off. Verified: image builds; GPU visible in both containers via compose; ollama ps = 100% GPU; faster-whisper cuda OK + CPU fallback OK; bridge /health 200.	2026-06-09 15:49:21 +09:00
javis-bot	25c77ac794	Dockerize: one-command stack with auto Ollama model pull Some checks failed Release / semantic-release (push) Successful in 22s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m55s Details Release / build-linux (push) Failing after 7m36s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details `docker compose up -d --build` now brings up the whole thing automatically — no host setup needed: - All-in-one javis image: TigerVNC+XFCE desktop, Chrome, Python brain bridge, Node/bun bot, managed by supervisord (verified: all 6 programs RUNNING). - ollama service + one-shot ollama-init that auto-pulls chat+embed models (verified end-to-end; `ollama list` shows pulled models). - Discord token deferred: without DISCORD_BOT_TOKEN the desktop, bridge, Ollama and models all run; only the bot waits (no crash loop). - Slim container deps (bridge/requirements-bridge.txt) drop the unused PyQt6/torch/chatterbox/sounddevice stack. Piper voice + Whisper models auto-download into named volumes. - Configurable host ports (VNC_PORT/NOVNC_PORT/BRIDGE_PORT) to avoid clashing with a host VNC already on 5901. Bridge binds 0.0.0.0 in-container. Verified: image builds; brain imports; bridge /health 200; noVNC 200; X display :1 @1920x1080; auto-pull completes; supervisorctl status all RUNNING.	2026-06-09 15:27:41 +09:00
javis-bot	c4abf63f38	Add Discord-native hybrid front-end for Jarvis (bot + bridge) Some checks failed Release / semantic-release (push) Successful in 59s Details tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s Details Release / build-linux (push) Failing after 7m47s Details Release / build-windows (push) Has been cancelled Details Release / build-macos (arm64, macos-latest) (push) Has been cancelled Details Release / build-macos (x64, macos-15-intel) (push) Has been cancelled Details Release / release-main (push) Has been cancelled Details Release / release-develop (push) Has been cancelled Details Transform isair/jarvis into a Discord-controlled voice assistant running on the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact. - bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral), voice channel join + voice receive/playback, pluggable VNC screen broadcast (selfbot live / noVNC / screenshot) - bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS behind a thin localhost HTTP API - .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite, docs/language-comparison.md and docs/vnc-xfce-setup.md Language decision: hybrid (Python brain + Node/bun Discord layer) because Discord blocks bot video; native screen broadcast only works via a Node selfbot library.	2026-06-09 14:51:05 +09:00
tkrmagid	a5bf8d1826	Initial commit	2026-06-09 13:58:41 +09:00

23 Commits