diff --git a/bot/scripts/stream-test/README.md b/bot/scripts/stream-test/README.md index 68561b6..92933fb 100644 --- a/bot/scripts/stream-test/README.md +++ b/bot/scripts/stream-test/README.md @@ -8,13 +8,16 @@ real browsing session captured from the X display. until stopped. All params from `.env` (`DISCORD_SELFBOT_TOKEN`, `DISCORD_GUILD_ID`, `DISCORD_VOICE_CHANNEL_ID`, `VNC_RESOLUTION`, `VNC_FRAMERATE`, `VNC_BITRATE_KBPS`, `STREAM_HW`, `VNC_DISPLAY`). -- `human.mjs` - human-like interaction helpers. Real mouse/keyboard via - `xdotool` (so the cursor is visible in the stream); Playwright only locates - elements. Every action is real input: address-bar navigation (Ctrl+L + - typing), search typing, clicking the video / settings menu / autoplay toggle / - play button, fullscreen via the `f` key, scrolling, and entering links. The - CDP/DOM API is used only to read state for verification, and as a rare click - fallback when an element has no on-screen box. +- `human.mjs` - human-like interaction helpers. Input is injected into the X + server with `xdotool` (synthetic X input, not a physical HID device, but the + browser and the captured screen see genuine pointer/keyboard events with a + visibly moving cursor); Playwright only locates elements. Every action is such + input: address-bar navigation (Ctrl+L + typing), search typing, clicking the + video / settings menu / autoplay toggle / play button, fullscreen via the `f` + key, and scrolling. Elements are brought into view with a real wheel scroll + (no DOM scrollIntoView); if an element has no on-screen box the click fails + rather than falling back to a synthetic click. The CDP/DOM API is used only to + read state for verification, never to act. - `scenario.mjs` - the browse scenario (YouTube -> IU live -> 1080p -> fullscreen -> Naver -> 나무위키), driven with the human helpers. Connects to a Chrome already running with `--remote-debugging-port` (`CDP_PORT`, default diff --git a/bot/scripts/stream-test/human.mjs b/bot/scripts/stream-test/human.mjs index 5b17c69..90b534a 100644 --- a/bot/scripts/stream-test/human.mjs +++ b/bot/scripts/stream-test/human.mjs @@ -1,12 +1,15 @@ -// Human-like interaction helpers: drive the REAL X mouse/keyboard via xdotool -// so the cursor visibly moves and is captured by the screen stream, using -// Playwright only to LOCATE elements and read state. This is the default -// interaction mode for the browse scenarios. +// Human-like interaction helpers. Drive input with xdotool, using Playwright +// only to LOCATE elements and read state. // -// Note: only the user-visible browsing actions are real input (cursor move, -// click, scroll, char-by-char typing). Behind-the-scenes control (window -// fullscreen, play, quality, autoplay toggle, page navigation, and click -// fallbacks) intentionally uses the CDP/DOM API for reliability. +// What xdotool actually is: it injects input events into the X server (it is +// NOT a physical HID device). The browser and the captured screen receive them +// as genuine pointer/keyboard input, with a visibly moving cursor. Every ACTION +// here is such input: cursor move, click, char-by-char typing, key presses, and +// wheel scroll - including (in scenario.mjs) navigation, quality, fullscreen and +// the autoplay toggle. The CDP/DOM API is used only to READ state for +// verification, never to perform an action. Elements are brought into view with +// a real wheel scroll (not a DOM scrollIntoView); if an element has no on-screen +// box, the click fails rather than falling back to a synthetic click. import { execFile } from 'node:child_process'; const DISPLAY = process.env.VNC_DISPLAY || ':1'; @@ -55,12 +58,27 @@ export async function humanClickXY(sx, sy) { await sleep(rand(130, 300)); } -// Locate a Playwright element, move the real cursor into it (random offset), click. +// Bring an element into view using a REAL wheel scroll (not a DOM +// scrollIntoView). Returns its viewport box, or null if it can't be revealed. +async function bringIntoView(page, locator) { + const ih = await page.evaluate(() => window.innerHeight); + for (let i = 0; i < 14; i++) { + const box = await locator.boundingBox().catch(() => null); + if (box && box.y >= 70 && box.y + box.height <= ih - 70) return box; + const button = box ? (box.y < 70 ? '4' : '5') : '5'; // 4=up, 5=down + await xdo(['click', button]); await xdo(['click', button]); await xdo(['click', button]); + await sleep(rand(120, 240)); + } + return await locator.boundingBox().catch(() => null); +} + +// Locate a Playwright element, real-wheel it into view, move the real cursor +// into it (random offset), and click. No synthetic-click fallback: if the +// element has no on-screen box, this throws. export async function humanClick(page, locator) { - await locator.scrollIntoViewIfNeeded().catch(() => {}); await sleep(rand(150, 380)); - const box = await locator.boundingBox(); - if (!box) { await locator.click({ timeout: 5000 }).catch(() => {}); return; } + const box = await bringIntoView(page, locator); + if (!box) throw new Error('humanClick: element has no on-screen box; refusing synthetic click'); const { ox, oy } = await contentOrigin(page); const sx = Math.round(ox + box.x + box.width * rand(0.35, 0.65)); const sy = Math.round(oy + box.y + box.height * rand(0.35, 0.65));