GPU acceleration is now on by default and verified end-to-end on the
Blackwell RTX 5050 (sm_120):
- Ollama offloads 100% to GPU (log: library=CUDA compute=12.0,
BLACKWELL_NATIVE_FP4=1). compose passes GPU via CDI
(devices: nvidia.com/gpu=all) to both ollama and javis.
- Whisper STT on GPU: faster-whisper>=1.1.0 + nvidia-cublas/cudnn cu12,
LD_LIBRARY_PATH baked into the image. Verified float16 transcribe on
sm_120; bridge auto-falls back to CPU when no GPU is present.
- Model: default chat model -> qwen3:8b (best 8GB-VRAM tool-calling,
~5GB Q4). Embed stays nomic-embed-text.
- README documents the host one-time setup (nvidia-container-toolkit +
`nvidia-ctk cdi generate`) and GPU on/off.
Verified: image builds; GPU visible in both containers via compose;
ollama ps = 100% GPU; faster-whisper cuda OK + CPU fallback OK;
bridge /health 200.