Files
javis_bot/docs/stream_browser_modes.md
javis-bot c420d5da53 feat(stream): true-mode browser-action core + Gemini scaffold + mode design
First increment of the STREAM_BROWSER real-time-info modes (true = browser,
false = Gemini):

- browse-search.mjs: drives the on-screen Chrome via CDP so the action shows on
  the broadcast. `search` returns the top Google results (title/url/snippet);
  `youtube` plays the first result. Verified live: real-time Seoul weather
  results, and IU 'Good Day' MV playback.
- .env.example: GEMINI_API_KEY / GEMINI_MODEL for the false-mode Gemini account.
- docs/stream_browser_modes.md: architecture + integration map (brain config,
  the two mode-gated tools, registry, design decisions) for the remaining wiring.

The Python brain wiring (config.py mode/gemini fields, browseAndSearch +
geminiSearch tools, registry, specs, llm_contexts) lands next - it needs a
running brain and a Gemini key to verify, rather than committing untested edits
into the 39k-line engine.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-10 16:36:35 +09:00

2.3 KiB

Real-time info modes (STREAM_BROWSER)

The bot answers via the Python brain (bridge/server.py -> src/jarvis). Real-time info is fetched by a tool the reply engine calls. STREAM_BROWSER selects HOW:

  • true (default): drive the on-screen Chrome (CDP at CDP_PORT, default 9222) to Google-search / play YouTube / read the page. The action is visible on the Go-Live broadcast. The browser is already up on the VNC display :1.
  • false: use the Google Gemini API (grounded with Google Search) for real-time info. No screen share needed (voice + API only).

Components

Piece Path Status
Mode flag (bot) bot/src/config.ts screenBrowser, enforced in selfbot.ts done
Browser search core (Node/CDP) bot/scripts/stream-test/browse-search.mjs this change
Brain mode read src/jarvis/config.py stream_browser from env TODO
Gemini key/model GEMINI_API_KEY, GEMINI_MODEL (.env) + config.py scaffolded
browseAndSearch tool (true) src/jarvis/tools/builtin/browse_and_search.py -> subprocess the Node core TODO
geminiSearch tool (false) src/jarvis/tools/builtin/gemini_search.py (REST, no new dep) TODO
Registry (mode-gated) src/jarvis/tools/registry.py BUILTIN_TOOLS TODO
Specs + docs/llm_contexts.md alongside each tool TODO

Design decisions

  • The browser tool (Python) subprocesses a Node script rather than adding a Python CDP/playwright dependency: the Node layer already owns Chrome/CDP (broadcast-helper.mjs, selfbot.ts), so the brain shells out to node browse-search.mjs <query> and wraps the JSON result in the engine's UNTRUSTED WEB EXTRACT envelope. Keeps the 39k-line Python brain dep-free.
  • Gemini uses the REST endpoint (generativelanguage.googleapis.com) via stdlib urllib with the google_search grounding tool - no SDK dependency.
  • Tools return the same ToolExecutionResult(success, reply_text) envelope shape as webSearch, so downstream synthesis is unchanged. The brain reads STREAM_BROWSER once at startup and registers the matching tool.

To finish / verify

  • Provide GEMINI_API_KEY to build + verify the false-mode path (a real call is needed to confirm grounding output).
  • Wire config.py + the two Python tools + registry, update specs and docs/llm_contexts.md (new Gemini LLM context).