First increment of the STREAM_BROWSER real-time-info modes (true = browser, false = Gemini): - browse-search.mjs: drives the on-screen Chrome via CDP so the action shows on the broadcast. `search` returns the top Google results (title/url/snippet); `youtube` plays the first result. Verified live: real-time Seoul weather results, and IU 'Good Day' MV playback. - .env.example: GEMINI_API_KEY / GEMINI_MODEL for the false-mode Gemini account. - docs/stream_browser_modes.md: architecture + integration map (brain config, the two mode-gated tools, registry, design decisions) for the remaining wiring. The Python brain wiring (config.py mode/gemini fields, browseAndSearch + geminiSearch tools, registry, specs, llm_contexts) lands next - it needs a running brain and a Gemini key to verify, rather than committing untested edits into the 39k-line engine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
43 lines
2.3 KiB
Markdown
43 lines
2.3 KiB
Markdown
# Real-time info modes (`STREAM_BROWSER`)
|
|
|
|
The bot answers via the Python brain (`bridge/server.py` -> `src/jarvis`). Real-time
|
|
info is fetched by a tool the reply engine calls. `STREAM_BROWSER` selects HOW:
|
|
|
|
- **true** (default): drive the on-screen Chrome (CDP at `CDP_PORT`, default 9222)
|
|
to Google-search / play YouTube / read the page. The action is visible on the
|
|
Go-Live broadcast. The browser is already up on the VNC display `:1`.
|
|
- **false**: use the Google Gemini API (grounded with Google Search) for
|
|
real-time info. No screen share needed (voice + API only).
|
|
|
|
## Components
|
|
|
|
| Piece | Path | Status |
|
|
|---|---|---|
|
|
| Mode flag (bot) | `bot/src/config.ts` `screenBrowser`, enforced in `selfbot.ts` | done |
|
|
| Browser search core (Node/CDP) | `bot/scripts/stream-test/browse-search.mjs` | this change |
|
|
| Brain mode read | `src/jarvis/config.py` `stream_browser` from env | TODO |
|
|
| Gemini key/model | `GEMINI_API_KEY`, `GEMINI_MODEL` (.env) + `config.py` | scaffolded |
|
|
| `browseAndSearch` tool (true) | `src/jarvis/tools/builtin/browse_and_search.py` -> subprocess the Node core | TODO |
|
|
| `geminiSearch` tool (false) | `src/jarvis/tools/builtin/gemini_search.py` (REST, no new dep) | TODO |
|
|
| Registry (mode-gated) | `src/jarvis/tools/registry.py` `BUILTIN_TOOLS` | TODO |
|
|
| Specs + `docs/llm_contexts.md` | alongside each tool | TODO |
|
|
|
|
## Design decisions
|
|
|
|
- The browser tool (Python) **subprocesses a Node script** rather than adding a
|
|
Python CDP/playwright dependency: the Node layer already owns Chrome/CDP
|
|
(`broadcast-helper.mjs`, `selfbot.ts`), so the brain shells out to
|
|
`node browse-search.mjs <query>` and wraps the JSON result in the engine's
|
|
`UNTRUSTED WEB EXTRACT` envelope. Keeps the 39k-line Python brain dep-free.
|
|
- Gemini uses the REST endpoint (`generativelanguage.googleapis.com`) via stdlib
|
|
`urllib` with the `google_search` grounding tool - no SDK dependency.
|
|
- Tools return the same `ToolExecutionResult(success, reply_text)` envelope shape
|
|
as `webSearch`, so downstream synthesis is unchanged. The brain reads
|
|
`STREAM_BROWSER` once at startup and registers the matching tool.
|
|
|
|
## To finish / verify
|
|
- Provide `GEMINI_API_KEY` to build + verify the false-mode path (a real call is
|
|
needed to confirm grounding output).
|
|
- Wire `config.py` + the two Python tools + registry, update specs and
|
|
`docs/llm_contexts.md` (new Gemini LLM context).
|