refactor(stream-test): real-wheel into view, no synthetic-click fallback
Address review accuracy: humanClick used DOM scrollIntoViewIfNeeded and fell back to Playwright locator.click() when an element had no box - neither is real input. Now it brings elements into view with a real wheel scroll and throws if there is no on-screen box (no synthetic click). Header comment and README corrected: xdotool injects synthetic X input (not a physical HID device), and all actions are real input while the CDP/DOM API is used only to read state.
This commit is contained in:
@@ -8,13 +8,16 @@ real browsing session captured from the X display.
|
||||
until stopped. All params from `.env` (`DISCORD_SELFBOT_TOKEN`,
|
||||
`DISCORD_GUILD_ID`, `DISCORD_VOICE_CHANNEL_ID`, `VNC_RESOLUTION`,
|
||||
`VNC_FRAMERATE`, `VNC_BITRATE_KBPS`, `STREAM_HW`, `VNC_DISPLAY`).
|
||||
- `human.mjs` - human-like interaction helpers. Real mouse/keyboard via
|
||||
`xdotool` (so the cursor is visible in the stream); Playwright only locates
|
||||
elements. Every action is real input: address-bar navigation (Ctrl+L +
|
||||
typing), search typing, clicking the video / settings menu / autoplay toggle /
|
||||
play button, fullscreen via the `f` key, scrolling, and entering links. The
|
||||
CDP/DOM API is used only to read state for verification, and as a rare click
|
||||
fallback when an element has no on-screen box.
|
||||
- `human.mjs` - human-like interaction helpers. Input is injected into the X
|
||||
server with `xdotool` (synthetic X input, not a physical HID device, but the
|
||||
browser and the captured screen see genuine pointer/keyboard events with a
|
||||
visibly moving cursor); Playwright only locates elements. Every action is such
|
||||
input: address-bar navigation (Ctrl+L + typing), search typing, clicking the
|
||||
video / settings menu / autoplay toggle / play button, fullscreen via the `f`
|
||||
key, and scrolling. Elements are brought into view with a real wheel scroll
|
||||
(no DOM scrollIntoView); if an element has no on-screen box the click fails
|
||||
rather than falling back to a synthetic click. The CDP/DOM API is used only to
|
||||
read state for verification, never to act.
|
||||
- `scenario.mjs` - the browse scenario (YouTube -> IU live -> 1080p ->
|
||||
fullscreen -> Naver -> 나무위키), driven with the human helpers. Connects to a
|
||||
Chrome already running with `--remote-debugging-port` (`CDP_PORT`, default
|
||||
|
||||
@@ -1,12 +1,15 @@
|
||||
// Human-like interaction helpers: drive the REAL X mouse/keyboard via xdotool
|
||||
// so the cursor visibly moves and is captured by the screen stream, using
|
||||
// Playwright only to LOCATE elements and read state. This is the default
|
||||
// interaction mode for the browse scenarios.
|
||||
// Human-like interaction helpers. Drive input with xdotool, using Playwright
|
||||
// only to LOCATE elements and read state.
|
||||
//
|
||||
// Note: only the user-visible browsing actions are real input (cursor move,
|
||||
// click, scroll, char-by-char typing). Behind-the-scenes control (window
|
||||
// fullscreen, play, quality, autoplay toggle, page navigation, and click
|
||||
// fallbacks) intentionally uses the CDP/DOM API for reliability.
|
||||
// What xdotool actually is: it injects input events into the X server (it is
|
||||
// NOT a physical HID device). The browser and the captured screen receive them
|
||||
// as genuine pointer/keyboard input, with a visibly moving cursor. Every ACTION
|
||||
// here is such input: cursor move, click, char-by-char typing, key presses, and
|
||||
// wheel scroll - including (in scenario.mjs) navigation, quality, fullscreen and
|
||||
// the autoplay toggle. The CDP/DOM API is used only to READ state for
|
||||
// verification, never to perform an action. Elements are brought into view with
|
||||
// a real wheel scroll (not a DOM scrollIntoView); if an element has no on-screen
|
||||
// box, the click fails rather than falling back to a synthetic click.
|
||||
import { execFile } from 'node:child_process';
|
||||
|
||||
const DISPLAY = process.env.VNC_DISPLAY || ':1';
|
||||
@@ -55,12 +58,27 @@ export async function humanClickXY(sx, sy) {
|
||||
await sleep(rand(130, 300));
|
||||
}
|
||||
|
||||
// Locate a Playwright element, move the real cursor into it (random offset), click.
|
||||
// Bring an element into view using a REAL wheel scroll (not a DOM
|
||||
// scrollIntoView). Returns its viewport box, or null if it can't be revealed.
|
||||
async function bringIntoView(page, locator) {
|
||||
const ih = await page.evaluate(() => window.innerHeight);
|
||||
for (let i = 0; i < 14; i++) {
|
||||
const box = await locator.boundingBox().catch(() => null);
|
||||
if (box && box.y >= 70 && box.y + box.height <= ih - 70) return box;
|
||||
const button = box ? (box.y < 70 ? '4' : '5') : '5'; // 4=up, 5=down
|
||||
await xdo(['click', button]); await xdo(['click', button]); await xdo(['click', button]);
|
||||
await sleep(rand(120, 240));
|
||||
}
|
||||
return await locator.boundingBox().catch(() => null);
|
||||
}
|
||||
|
||||
// Locate a Playwright element, real-wheel it into view, move the real cursor
|
||||
// into it (random offset), and click. No synthetic-click fallback: if the
|
||||
// element has no on-screen box, this throws.
|
||||
export async function humanClick(page, locator) {
|
||||
await locator.scrollIntoViewIfNeeded().catch(() => {});
|
||||
await sleep(rand(150, 380));
|
||||
const box = await locator.boundingBox();
|
||||
if (!box) { await locator.click({ timeout: 5000 }).catch(() => {}); return; }
|
||||
const box = await bringIntoView(page, locator);
|
||||
if (!box) throw new Error('humanClick: element has no on-screen box; refusing synthetic click');
|
||||
const { ox, oy } = await contentOrigin(page);
|
||||
const sx = Math.round(ox + box.x + box.width * rand(0.35, 0.65));
|
||||
const sy = Math.round(oy + box.y + box.height * rand(0.35, 0.65));
|
||||
|
||||
Reference in New Issue
Block a user