Files
javis_bot/docs/stream_browser_modes.md
javis-bot c420d5da53 feat(stream): true-mode browser-action core + Gemini scaffold + mode design
First increment of the STREAM_BROWSER real-time-info modes (true = browser,
false = Gemini):

- browse-search.mjs: drives the on-screen Chrome via CDP so the action shows on
  the broadcast. `search` returns the top Google results (title/url/snippet);
  `youtube` plays the first result. Verified live: real-time Seoul weather
  results, and IU 'Good Day' MV playback.
- .env.example: GEMINI_API_KEY / GEMINI_MODEL for the false-mode Gemini account.
- docs/stream_browser_modes.md: architecture + integration map (brain config,
  the two mode-gated tools, registry, design decisions) for the remaining wiring.

The Python brain wiring (config.py mode/gemini fields, browseAndSearch +
geminiSearch tools, registry, specs, llm_contexts) lands next - it needs a
running brain and a Gemini key to verify, rather than committing untested edits
into the 39k-line engine.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-10 16:36:35 +09:00

43 lines
2.3 KiB
Markdown

# Real-time info modes (`STREAM_BROWSER`)
The bot answers via the Python brain (`bridge/server.py` -> `src/jarvis`). Real-time
info is fetched by a tool the reply engine calls. `STREAM_BROWSER` selects HOW:
- **true** (default): drive the on-screen Chrome (CDP at `CDP_PORT`, default 9222)
to Google-search / play YouTube / read the page. The action is visible on the
Go-Live broadcast. The browser is already up on the VNC display `:1`.
- **false**: use the Google Gemini API (grounded with Google Search) for
real-time info. No screen share needed (voice + API only).
## Components
| Piece | Path | Status |
|---|---|---|
| Mode flag (bot) | `bot/src/config.ts` `screenBrowser`, enforced in `selfbot.ts` | done |
| Browser search core (Node/CDP) | `bot/scripts/stream-test/browse-search.mjs` | this change |
| Brain mode read | `src/jarvis/config.py` `stream_browser` from env | TODO |
| Gemini key/model | `GEMINI_API_KEY`, `GEMINI_MODEL` (.env) + `config.py` | scaffolded |
| `browseAndSearch` tool (true) | `src/jarvis/tools/builtin/browse_and_search.py` -> subprocess the Node core | TODO |
| `geminiSearch` tool (false) | `src/jarvis/tools/builtin/gemini_search.py` (REST, no new dep) | TODO |
| Registry (mode-gated) | `src/jarvis/tools/registry.py` `BUILTIN_TOOLS` | TODO |
| Specs + `docs/llm_contexts.md` | alongside each tool | TODO |
## Design decisions
- The browser tool (Python) **subprocesses a Node script** rather than adding a
Python CDP/playwright dependency: the Node layer already owns Chrome/CDP
(`broadcast-helper.mjs`, `selfbot.ts`), so the brain shells out to
`node browse-search.mjs <query>` and wraps the JSON result in the engine's
`UNTRUSTED WEB EXTRACT` envelope. Keeps the 39k-line Python brain dep-free.
- Gemini uses the REST endpoint (`generativelanguage.googleapis.com`) via stdlib
`urllib` with the `google_search` grounding tool - no SDK dependency.
- Tools return the same `ToolExecutionResult(success, reply_text)` envelope shape
as `webSearch`, so downstream synthesis is unchanged. The brain reads
`STREAM_BROWSER` once at startup and registers the matching tool.
## To finish / verify
- Provide `GEMINI_API_KEY` to build + verify the false-mode path (a real call is
needed to confirm grounding output).
- Wire `config.py` + the two Python tools + registry, update specs and
`docs/llm_contexts.md` (new Gemini LLM context).