Ollama

Run open-weight models locally — no API key, no data leaves your machine. Ollama is the easiest way to host models on your own laptop or a workstation.

What you need

Ollama installed and running on the machine Kenaz lives on (or any machine on your network)
At least one model pulled: ollama pull llama3.2 or similar
~10–80 GB of free disk per model, plus enough RAM/VRAM to run it

Hardware reality check:

8 GB RAM — small models only (llama3.2:1b, qwen2.5:1.5b)
16 GB RAM — llama3.2, qwen2.5:7b, mistral
32 GB+ RAM / GPU — llama3.3:70b, qwen2.5:32b, deepseek-r1:32b

A model that doesn't fit will load painfully slowly off swap or fail outright.

Steps

Install Ollama. ollama.com/download — runs on macOS, Windows, Linux. The installer adds a system service that listens on http://localhost:11434 by default.
Pull a model.
```
ollama pull llama3.2
```
List what you've got:
```
ollama list
```
Add to Kenaz. Providers → Add provider → Ollama. The endpoint defaults to http://localhost:11434 — change it if Ollama is running on a different host. No API key needed (set the Bearer field if you've put Ollama behind a reverse proxy that requires one). Click Test, Save.

Kenaz reads the list of locally-available models on save. Pull a new model later via ollama pull, then click Refresh models in the Kenaz provider editor to pick it up.

Models and what they're for

The full library is at ollama.com/library. Notable picks:

llama3.2 — Meta's daily-driver. Good general assistant, fast.
qwen2.5 — Alibaba's. Stronger at code than Llama.
deepseek-r1 — reasoning model, slow but strong on multi-step problems.
mistral / mixtral — efficient European models.
gemma2 — Google's open-weight family.
phi-3 — Microsoft's small efficient models.

Tags (the part after :) pick the size variant: llama3.2:1b, llama3.2:3b, etc.

Pricing

Free. Pay your electric bill.

Privacy posture

Nothing leaves your machine. Period. Verifiable: pull the network cable and Ollama still works.
Ollama itself collects no telemetry by default. You can verify with lsof -i -P while Ollama is running.

Tool use

Ollama supports OpenAI-compatible function calling on models that the underlying GGUF advertises tool support for (most recent Llama, Qwen, Mistral models). Capability hints in Kenaz reflect what each Ollama model declares; tools won't show up for models that can't use them.

Tool quality on local models is materially worse than frontier hosted models. If your work depends on robust multi-step tool use, this isn't the right backend.

Troubleshooting

connection refused on Test — Ollama isn't running. ollama serve (or restart the Ollama desktop app).
Test passes, no models listed. You haven't pulled any. ollama pull <model>.
Generation is unbearably slow. Model is too big for your RAM. Pick a smaller variant, or move Ollama to a machine with a GPU and point Kenaz at http://that-host:11434.
Network access from Kenaz. Ollama listens on localhost by default. To reach it from another machine you need to set OLLAMA_HOST=0.0.0.0:11434 and restart — but be aware that exposes your model to anyone on the network.

What you need​

Steps​

Models and what they're for​

Pricing​

Privacy posture​

Tool use​

Troubleshooting​

Useful links​