ollama

Ollama supports various models including Llama, Mistral, and more. To run a model, use the "ollama run" command followed by the model name.

VRAM and quantization are the two main bottlenecks with Local LLM quality

1. Check if service is running

systemctl status ollama

test command

podman run --rm --device nvidia.com/gpu=all ghcr.io/open-webui/open-webui:ollama

HP

4GB VRAM RTX 3050 on Origami (Fedora Atomic + Nvidia drivers). This is based on the official Open WebUI GitHub examples, community Podman setups, and Nvidia GPU best practices in early 2026.

The full, verified command (for the bundled Ollama + WebUI in one container):

Bash

podman run -d\
  --name openwebui\
  --gpus=all\
  -p 3000:8080\
  -v ollama:/root/.ollama\
  -v open-webui:/app/backend/data\
  --restart always\
  ghcr.io/open-webui/open-webui:ollama

Flag-by-flag breakdown (why each one, and tips to avoid fails)

-d → Detached mode: Runs in background so your terminal stays free. (Essential, or it blocks.)
--name openwebui → Gives the container a friendly name. Makes podman stop/start/restart openwebui easy. (Change if you want, but keep it simple.)
--gpus=all → Passes your entire RTX 3050 to the container (CUDA support). Since Origami Nvidia has drivers + toolkit layered in, this should work rootless. Test first: podman run --rm --gpus=all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi --- if it shows your 3050, good. If "no devices" error, run podman system migrate or reboot.
-p 3000:8080 → Maps container's port 8080 (WebUI listens here) to your host's 3000. Access: http://localhost:3000. Use 3001:8080 if 3000 is taken.
-v ollama:/root/.ollama → Named volume for Ollama models/data. Pulls once, persists forever (even after podman rm). Without this: models vanish on stop/recreate.
-v open-webui:/app/backend/data → Stores WebUI settings, users, chats. Same persistence magic.
--restart always → Auto-restarts on crash, reboot, or host updates. Keeps it running like a service.
Image: ghcr.io/open-webui/open-webui:ollama → Bundled version (Debian base + Ollama inside). No separate Ollama needed.

Run it exactly like that. First pull might take 5-10 mins (2-3 GB).

After:

podman ps → See it running.
Browser → localhost:3000 → Sign up (local, no cloud).
In UI: Settings > Models > Pull something small like llama3.2:3b to test (your 4GB VRAM handles ~8B quantized fine).

If any error pops (e.g., GPU not detected), paste it---we fix one thing. This setup is clean, persistent, and low-privilege (rootless Podman). Ready to fire it?

Mac

Podman run command

podman run -d --name openwebui \
--pods \
-p 8080:8080 \
--restart policy=always \
--device /dev/dri \
-e OLLAMA_BASE_URL=http://host.containers.internal:11434 \
-v openwebui:/app/backend/data \
ghcr.io/open-webui/open-webui:main

WebUI for Ollama

Tokens/Second

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:cuda

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

1. Check if service is running​

test command​

HP​

Flag-by-flag breakdown (why each one, and tips to avoid fails)​

Mac​

Podman run command​

WebUI for Ollama​

Tokens/Second​