Address remaining review items (queue, selfbot v6 API, ldconfig, resample)
Some checks failed
Release / semantic-release (push) Successful in 22s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m56s
Release / build-linux (push) Failing after 7m15s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Some checks failed
Release / semantic-release (push) Successful in 22s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m56s
Release / build-linux (push) Failing after 7m15s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
- voice.ts: reply playback is now a FIFO queue (AudioPlayerStatus.Idle drains
it) so concurrent speakers no longer cut each other's replies off.
- selfbot.ts: rewritten against the REAL @dank074/discord-video-stream v6 API
(verified from its d.ts): prepareStream(input, opts, signal)->{command,output},
playStream(output, streamer, {type:"go-live"}, signal), Streamer.joinVoice.
x11grab via customInputOptions; optional NVENC encode (RTX 5050) via exported
`nvenc`. package.json pinned to ^6.0.0 (was a wrong ^4.2.1).
- Dockerfile: dropped the hardcoded python3.12 LD_LIBRARY_PATH. faster-whisper
>=1.1 self-locates the pip CUDA libs; ldconfig (full path, glob) registers
them as a robust fallback. Verified: ld.so cache lists libcublas/libcudnn and
GPU whisper works with LD_LIBRARY_PATH empty.
- bridge: STT resample 48k->16k upgraded from nearest-neighbor to linear
(np.interp).
Verified: tsc clean, image builds, GPU whisper OK via ldconfig, compose valid.
This commit is contained in:
@@ -153,13 +153,13 @@ def transcribe(wav_bytes: bytes) -> dict:
|
||||
|
||||
pcm, sr = _read_wav_pcm(wav_bytes)
|
||||
audio = np.frombuffer(pcm, dtype=np.int16).astype(np.float32) / 32768.0
|
||||
# faster-whisper expects 16kHz mono float32; resample if needed.
|
||||
# faster-whisper expects 16kHz mono float32; linearly resample if needed.
|
||||
if sr != 16000 and audio.size:
|
||||
import math
|
||||
ratio = 16000 / sr
|
||||
idx = (np.arange(int(audio.size * ratio)) / ratio).astype(np.int64)
|
||||
idx = np.clip(idx, 0, audio.size - 1)
|
||||
audio = audio[idx]
|
||||
n_out = int(round(audio.size * 16000 / sr))
|
||||
if n_out > 0:
|
||||
x_old = np.linspace(0.0, 1.0, num=audio.size, endpoint=False)
|
||||
x_new = np.linspace(0.0, 1.0, num=n_out, endpoint=False)
|
||||
audio = np.interp(x_new, x_old, audio).astype(np.float32)
|
||||
segments, info = _whisper.transcribe(audio, beam_size=1)
|
||||
text = "".join(seg.text for seg in segments).strip()
|
||||
return {"text": text, "language": getattr(info, "language", None)}
|
||||
|
||||
Reference in New Issue
Block a user