refactor(stream-test): real-wheel into view, no synthetic-click fallback

Address review accuracy: humanClick used DOM scrollIntoViewIfNeeded and fell
back to Playwright locator.click() when an element had no box - neither is real
input. Now it brings elements into view with a real wheel scroll and throws if
there is no on-screen box (no synthetic click). Header comment and README
corrected: xdotool injects synthetic X input (not a physical HID device), and
all actions are real input while the CDP/DOM API is used only to read state.
This commit is contained in:
javis-bot
2026-06-10 14:15:26 +09:00
parent 2cdd159fc1
commit bbc2fa3f7a
2 changed files with 40 additions and 19 deletions

View File

@@ -8,13 +8,16 @@ real browsing session captured from the X display.
until stopped. All params from `.env` (`DISCORD_SELFBOT_TOKEN`, until stopped. All params from `.env` (`DISCORD_SELFBOT_TOKEN`,
`DISCORD_GUILD_ID`, `DISCORD_VOICE_CHANNEL_ID`, `VNC_RESOLUTION`, `DISCORD_GUILD_ID`, `DISCORD_VOICE_CHANNEL_ID`, `VNC_RESOLUTION`,
`VNC_FRAMERATE`, `VNC_BITRATE_KBPS`, `STREAM_HW`, `VNC_DISPLAY`). `VNC_FRAMERATE`, `VNC_BITRATE_KBPS`, `STREAM_HW`, `VNC_DISPLAY`).
- `human.mjs` - human-like interaction helpers. Real mouse/keyboard via - `human.mjs` - human-like interaction helpers. Input is injected into the X
`xdotool` (so the cursor is visible in the stream); Playwright only locates server with `xdotool` (synthetic X input, not a physical HID device, but the
elements. Every action is real input: address-bar navigation (Ctrl+L + browser and the captured screen see genuine pointer/keyboard events with a
typing), search typing, clicking the video / settings menu / autoplay toggle / visibly moving cursor); Playwright only locates elements. Every action is such
play button, fullscreen via the `f` key, scrolling, and entering links. The input: address-bar navigation (Ctrl+L + typing), search typing, clicking the
CDP/DOM API is used only to read state for verification, and as a rare click video / settings menu / autoplay toggle / play button, fullscreen via the `f`
fallback when an element has no on-screen box. key, and scrolling. Elements are brought into view with a real wheel scroll
(no DOM scrollIntoView); if an element has no on-screen box the click fails
rather than falling back to a synthetic click. The CDP/DOM API is used only to
read state for verification, never to act.
- `scenario.mjs` - the browse scenario (YouTube -> IU live -> 1080p -> - `scenario.mjs` - the browse scenario (YouTube -> IU live -> 1080p ->
fullscreen -> Naver -> 나무위키), driven with the human helpers. Connects to a fullscreen -> Naver -> 나무위키), driven with the human helpers. Connects to a
Chrome already running with `--remote-debugging-port` (`CDP_PORT`, default Chrome already running with `--remote-debugging-port` (`CDP_PORT`, default

View File

@@ -1,12 +1,15 @@
// Human-like interaction helpers: drive the REAL X mouse/keyboard via xdotool // Human-like interaction helpers. Drive input with xdotool, using Playwright
// so the cursor visibly moves and is captured by the screen stream, using // only to LOCATE elements and read state.
// Playwright only to LOCATE elements and read state. This is the default
// interaction mode for the browse scenarios.
// //
// Note: only the user-visible browsing actions are real input (cursor move, // What xdotool actually is: it injects input events into the X server (it is
// click, scroll, char-by-char typing). Behind-the-scenes control (window // NOT a physical HID device). The browser and the captured screen receive them
// fullscreen, play, quality, autoplay toggle, page navigation, and click // as genuine pointer/keyboard input, with a visibly moving cursor. Every ACTION
// fallbacks) intentionally uses the CDP/DOM API for reliability. // here is such input: cursor move, click, char-by-char typing, key presses, and
// wheel scroll - including (in scenario.mjs) navigation, quality, fullscreen and
// the autoplay toggle. The CDP/DOM API is used only to READ state for
// verification, never to perform an action. Elements are brought into view with
// a real wheel scroll (not a DOM scrollIntoView); if an element has no on-screen
// box, the click fails rather than falling back to a synthetic click.
import { execFile } from 'node:child_process'; import { execFile } from 'node:child_process';
const DISPLAY = process.env.VNC_DISPLAY || ':1'; const DISPLAY = process.env.VNC_DISPLAY || ':1';
@@ -55,12 +58,27 @@ export async function humanClickXY(sx, sy) {
await sleep(rand(130, 300)); await sleep(rand(130, 300));
} }
// Locate a Playwright element, move the real cursor into it (random offset), click. // Bring an element into view using a REAL wheel scroll (not a DOM
// scrollIntoView). Returns its viewport box, or null if it can't be revealed.
async function bringIntoView(page, locator) {
const ih = await page.evaluate(() => window.innerHeight);
for (let i = 0; i < 14; i++) {
const box = await locator.boundingBox().catch(() => null);
if (box && box.y >= 70 && box.y + box.height <= ih - 70) return box;
const button = box ? (box.y < 70 ? '4' : '5') : '5'; // 4=up, 5=down
await xdo(['click', button]); await xdo(['click', button]); await xdo(['click', button]);
await sleep(rand(120, 240));
}
return await locator.boundingBox().catch(() => null);
}
// Locate a Playwright element, real-wheel it into view, move the real cursor
// into it (random offset), and click. No synthetic-click fallback: if the
// element has no on-screen box, this throws.
export async function humanClick(page, locator) { export async function humanClick(page, locator) {
await locator.scrollIntoViewIfNeeded().catch(() => {});
await sleep(rand(150, 380)); await sleep(rand(150, 380));
const box = await locator.boundingBox(); const box = await bringIntoView(page, locator);
if (!box) { await locator.click({ timeout: 5000 }).catch(() => {}); return; } if (!box) throw new Error('humanClick: element has no on-screen box; refusing synthetic click');
const { ox, oy } = await contentOrigin(page); const { ox, oy } = await contentOrigin(page);
const sx = Math.round(ox + box.x + box.width * rand(0.35, 0.65)); const sx = Math.round(ox + box.x + box.width * rand(0.35, 0.65));
const sy = Math.round(oy + box.y + box.height * rand(0.35, 0.65)); const sy = Math.round(oy + box.y + box.height * rand(0.35, 0.65));