Address remaining review items (queue, selfbot v6 API, ldconfig, resample)

- voice.ts: reply playback is now a FIFO queue (AudioPlayerStatus.Idle drains it) so concurrent speakers no longer cut each other's replies off. - selfbot.ts: rewritten against the REAL @dank074/discord-video-stream v6 API (verified from its d.ts): prepareStream(input, opts, signal)->{command,output}, playStream(output, streamer, {type:"go-live"}, signal), Streamer.joinVoice. x11grab via customInputOptions; optional NVENC encode (RTX 5050) via exported `nvenc`. package.json pinned to ^6.0.0 (was a wrong ^4.2.1). - Dockerfile: dropped the hardcoded python3.12 LD_LIBRARY_PATH. faster-whisper >=1.1 self-locates the pip CUDA libs; ldconfig (full path, glob) registers them as a robust fallback. Verified: ld.so cache lists libcublas/libcudnn and GPU whisper works with LD_LIBRARY_PATH empty. - bridge: STT resample 48k->16k upgraded from nearest-neighbor to linear (np.interp). Verified: tsc clean, image builds, GPU whisper OK via ldconfig, compose valid.
2026-06-09 18:47:25 +09:00
parent 964123682f
commit b56c9c7721
7 changed files with 417 additions and 50 deletions
--- a/bridge/server.py
+++ b/bridge/server.py
@@ -153,13 +153,13 @@ def transcribe(wav_bytes: bytes) -> dict:

    pcm, sr = _read_wav_pcm(wav_bytes)
    audio = np.frombuffer(pcm, dtype=np.int16).astype(np.float32) / 32768.0
-    # faster-whisper expects 16kHz mono float32; resample if needed.
+    # faster-whisper expects 16kHz mono float32; linearly resample if needed.
    if sr != 16000 and audio.size:
-        import math
-        ratio = 16000 / sr
-        idx = (np.arange(int(audio.size * ratio)) / ratio).astype(np.int64)
-        idx = np.clip(idx, 0, audio.size - 1)
-        audio = audio[idx]
+        n_out = int(round(audio.size * 16000 / sr))
+        if n_out > 0:
+            x_old = np.linspace(0.0, 1.0, num=audio.size, endpoint=False)
+            x_new = np.linspace(0.0, 1.0, num=n_out, endpoint=False)
+            audio = np.interp(x_new, x_old, audio).astype(np.float32)
    segments, info = _whisper.transcribe(audio, beam_size=1)
    text = "".join(seg.text for seg in segments).strip()
    return {"text": text, "language": getattr(info, "language", None)}