javis_bot/docker-compose.yml at f93b241575ee7f59371c61e019b13f2e7e545019

tkrmagid/javis_bot

Fork 0

Files

javis-bot 0dbc0300d7

Release / semantic-release (push) Successful in 19s

Details

tests / Unit tests (Linux, Python 3.11) (push) Successful in 9m54s

Details

Release / build-linux (push) Failing after 7m14s

Details

Release / build-windows (push) Has been cancelled

Details

Release / build-macos (arm64, macos-latest) (push) Has been cancelled

Details

Release / build-macos (x64, macos-15-intel) (push) Has been cancelled

Details

Release / release-main (push) Has been cancelled

Details

Release / release-develop (push) Has been cancelled

Details

Enable GPU: LLM + Whisper on the RTX 5050, pick qwen3:8b

GPU acceleration is now on by default and verified end-to-end on the
Blackwell RTX 5050 (sm_120):

- Ollama offloads 100% to GPU (log: library=CUDA compute=12.0,
  BLACKWELL_NATIVE_FP4=1). compose passes GPU via CDI
  (devices: nvidia.com/gpu=all) to both ollama and javis.
- Whisper STT on GPU: faster-whisper>=1.1.0 + nvidia-cublas/cudnn cu12,
  LD_LIBRARY_PATH baked into the image. Verified float16 transcribe on
  sm_120; bridge auto-falls back to CPU when no GPU is present.
- Model: default chat model -> qwen3:8b (best 8GB-VRAM tool-calling,
  ~5GB Q4). Embed stays nomic-embed-text.
- README documents the host one-time setup (nvidia-container-toolkit +
  `nvidia-ctk cdi generate`) and GPU on/off.

Verified: image builds; GPU visible in both containers via compose;
ollama ps = 100% GPU; faster-whisper cuda OK + CPU fallback OK;
bridge /health 200.

2026-06-09 15:49:21 +09:00

3.3 KiB

Raw Blame History

View Raw

3.3 KiB Raw Blame History

3.3 KiB

Raw Blame History