Skip to main content

Ollama

Run open-weight models locally — no API key, no data leaves your machine. Ollama is the easiest way to host models on your own laptop or a workstation.

What you need

  • Ollama installed and running on the machine Kenaz lives on (or any machine on your network)
  • At least one model pulled: ollama pull llama3.2 or similar
  • ~10–80 GB of free disk per model, plus enough RAM/VRAM to run it

Hardware reality check:

  • 8 GB RAM — small models only (llama3.2:1b, qwen2.5:1.5b)
  • 16 GB RAMllama3.2, qwen2.5:7b, mistral
  • 32 GB+ RAM / GPUllama3.3:70b, qwen2.5:32b, deepseek-r1:32b

A model that doesn't fit will load painfully slowly off swap or fail outright.

Steps

  1. Install Ollama. ollama.com/download — runs on macOS, Windows, Linux. The installer adds a system service that listens on http://localhost:11434 by default.
  2. Pull a model.
    ollama pull llama3.2
    List what you've got:
    ollama list
  3. Add to Kenaz. Providers → Add provider → Ollama. The endpoint defaults to http://localhost:11434 — change it if Ollama is running on a different host. No API key needed (set the Bearer field if you've put Ollama behind a reverse proxy that requires one). Click Test, Save.

Kenaz reads the list of locally-available models on save. Pull a new model later via ollama pull, then click Refresh models in the Kenaz provider editor to pick it up.

Models and what they're for

The full library is at ollama.com/library. Notable picks:

  • llama3.2 — Meta's daily-driver. Good general assistant, fast.
  • qwen2.5 — Alibaba's. Stronger at code than Llama.
  • deepseek-r1 — reasoning model, slow but strong on multi-step problems.
  • mistral / mixtral — efficient European models.
  • gemma2 — Google's open-weight family.
  • phi-3 — Microsoft's small efficient models.

Tags (the part after :) pick the size variant: llama3.2:1b, llama3.2:3b, etc.

Pricing

Free. Pay your electric bill.

Privacy posture

  • Nothing leaves your machine. Period. Verifiable: pull the network cable and Ollama still works.
  • Ollama itself collects no telemetry by default. You can verify with lsof -i -P while Ollama is running.

Tool use

Ollama supports OpenAI-compatible function calling on models that the underlying GGUF advertises tool support for (most recent Llama, Qwen, Mistral models). Capability hints in Kenaz reflect what each Ollama model declares; tools won't show up for models that can't use them.

Tool quality on local models is materially worse than frontier hosted models. If your work depends on robust multi-step tool use, this isn't the right backend.

Troubleshooting

  • connection refused on Test — Ollama isn't running. ollama serve (or restart the Ollama desktop app).
  • Test passes, no models listed. You haven't pulled any. ollama pull <model>.
  • Generation is unbearably slow. Model is too big for your RAM. Pick a smaller variant, or move Ollama to a machine with a GPU and point Kenaz at http://that-host:11434.
  • Network access from Kenaz. Ollama listens on localhost by default. To reach it from another machine you need to set OLLAMA_HOST=0.0.0.0:11434 and restart — but be aware that exposes your model to anyone on the network.