First increment of the STREAM_BROWSER real-time-info modes (true = browser, false = Gemini): - browse-search.mjs: drives the on-screen Chrome via CDP so the action shows on the broadcast. `search` returns the top Google results (title/url/snippet); `youtube` plays the first result. Verified live: real-time Seoul weather results, and IU 'Good Day' MV playback. - .env.example: GEMINI_API_KEY / GEMINI_MODEL for the false-mode Gemini account. - docs/stream_browser_modes.md: architecture + integration map (brain config, the two mode-gated tools, registry, design decisions) for the remaining wiring. The Python brain wiring (config.py mode/gemini fields, browseAndSearch + geminiSearch tools, registry, specs, llm_contexts) lands next - it needs a running brain and a Gemini key to verify, rather than committing untested edits into the 39k-line engine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2.3 KiB
2.3 KiB
Real-time info modes (STREAM_BROWSER)
The bot answers via the Python brain (bridge/server.py -> src/jarvis). Real-time
info is fetched by a tool the reply engine calls. STREAM_BROWSER selects HOW:
- true (default): drive the on-screen Chrome (CDP at
CDP_PORT, default 9222) to Google-search / play YouTube / read the page. The action is visible on the Go-Live broadcast. The browser is already up on the VNC display:1. - false: use the Google Gemini API (grounded with Google Search) for real-time info. No screen share needed (voice + API only).
Components
| Piece | Path | Status |
|---|---|---|
| Mode flag (bot) | bot/src/config.ts screenBrowser, enforced in selfbot.ts |
done |
| Browser search core (Node/CDP) | bot/scripts/stream-test/browse-search.mjs |
this change |
| Brain mode read | src/jarvis/config.py stream_browser from env |
TODO |
| Gemini key/model | GEMINI_API_KEY, GEMINI_MODEL (.env) + config.py |
scaffolded |
browseAndSearch tool (true) |
src/jarvis/tools/builtin/browse_and_search.py -> subprocess the Node core |
TODO |
geminiSearch tool (false) |
src/jarvis/tools/builtin/gemini_search.py (REST, no new dep) |
TODO |
| Registry (mode-gated) | src/jarvis/tools/registry.py BUILTIN_TOOLS |
TODO |
Specs + docs/llm_contexts.md |
alongside each tool | TODO |
Design decisions
- The browser tool (Python) subprocesses a Node script rather than adding a
Python CDP/playwright dependency: the Node layer already owns Chrome/CDP
(
broadcast-helper.mjs,selfbot.ts), so the brain shells out tonode browse-search.mjs <query>and wraps the JSON result in the engine'sUNTRUSTED WEB EXTRACTenvelope. Keeps the 39k-line Python brain dep-free. - Gemini uses the REST endpoint (
generativelanguage.googleapis.com) via stdliburllibwith thegoogle_searchgrounding tool - no SDK dependency. - Tools return the same
ToolExecutionResult(success, reply_text)envelope shape aswebSearch, so downstream synthesis is unchanged. The brain readsSTREAM_BROWSERonce at startup and registers the matching tool.
To finish / verify
- Provide
GEMINI_API_KEYto build + verify the false-mode path (a real call is needed to confirm grounding output). - Wire
config.py+ the two Python tools + registry, update specs anddocs/llm_contexts.md(new Gemini LLM context).