Files
javis_bot/tests/performance/README.md
javis-bot c4abf63f38
Some checks failed
Release / semantic-release (push) Successful in 59s
tests / Unit tests (Linux, Python 3.11) (push) Successful in 13m45s
Release / build-linux (push) Failing after 7m47s
Release / build-windows (push) Has been cancelled
Release / build-macos (arm64, macos-latest) (push) Has been cancelled
Release / build-macos (x64, macos-15-intel) (push) Has been cancelled
Release / release-main (push) Has been cancelled
Release / release-develop (push) Has been cancelled
Add Discord-native hybrid front-end for Jarvis (bot + bridge)
Transform isair/jarvis into a Discord-controlled voice assistant running on
the Ubuntu VNC desktop, keeping the mature ~39k-line Python brain intact.

- bot/ (Node + bun, discord.js): /자비스 slash commands (ephemeral),
  voice channel join + voice receive/playback, pluggable VNC screen broadcast
  (selfbot live / noVNC / screenshot)
- bridge/ (Python, Flask): wraps jarvis STT + run_reply_engine + Piper TTS
  behind a thin localhost HTTP API
- .env.example, scripts/ (start_bridge/start_bot/dev), README rewrite,
  docs/language-comparison.md and docs/vnc-xfce-setup.md

Language decision: hybrid (Python brain + Node/bun Discord layer) because
Discord blocks bot video; native screen broadcast only works via a Node
selfbot library.
2026-06-09 14:51:05 +09:00

44 lines
1.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance tests
Per-context timings for the reply pipeline. Excluded from the default pytest run
(see `pytest.ini`'s `addopts = -m "not performance"`).
## Running
```bash
pytest tests/performance/ -v -m performance -s
```
The `-s` flag lets the report table print to stdout. Tests auto-skip when Ollama
is unreachable, so the harness is safe to leave in the repo.
## Env vars
| Var | Default | Description |
|-----|---------|-------------|
| `JARVIS_PERF_OLLAMA_URL` | `http://localhost:11434` | Ollama endpoint |
| `JARVIS_PERF_MODEL` | `gemma4:e2b` | Model pulled in Ollama for the run |
| `JARVIS_PERF_RUNS` | `3` | Runs per query (bump for tighter p95) |
| `JARVIS_PERF_REPORT_DIR` | `tests/performance/reports/` | JSON report output |
`PERF_RUNS=3` is a fast-iteration default. For stable p95 numbers when
benchmarking a change, use `JARVIS_PERF_RUNS=10` or higher.
## What it measures
- **`test_micro_benchmark_tiny_prompt`** — one warmup + N tiny round-trips.
Hardware baseline: the floor for every context's per-call cost.
- **`test_pipeline_timings_by_context`** — three representative queries × N runs
of `run_reply_engine`, with per-context timings bucketed via stack-frame
inspection in [`timing_recorder.py`](timing_recorder.py).
Shape invariants (not absolute numbers):
- Evaluator p50 ≤ main chat turn p50 × 1.5.
- Tool router p50 ≤ main chat turn p50 × 1.5.
- Enrichment extractor shares the router model chain.
Unmapped callers print as `other:<qualname>` — that's a signal to update the
`_CALLER_TO_CONTEXT` map in `timing_recorder.py` alongside `docs/llm_contexts.md`.
Reports are written to `reports/` and git-ignored.