Ollama runs large language models locally β it pulls, manages and serves models (Llama, Mistral, Gemma, Qwen, β¦) behind a simple HTTP API on port 11434, with an OpenAI-compatible /v1 surface.
This chart deploys the ollama/ollama server on Kubernetes with a persistent model store and optional NVIDIA GPU acceleration. Behaviour is driven by the bjw-s-labs common library, so almost every knob lives in values.yaml.
Key features:
/api/*) plus the OpenAI-compatible /v1/* endpointsruntimeClassName + NVIDIA_* env wired out of the box, CPU-only friendly by default/root/.ollama/metrics and an optional ServiceMonitorhelm repo add obeone https://charts.obeone.cloud
helm install ollama obeone/ollama
This is the headline feature of the chart, so it gets its own section.
The exporter sidecar is a transparent proxy that sits in front of Ollama: it listens on :8000, forwards every API request to the Ollama server in the same pod (http://localhost:11434), and exposes Prometheus metrics on /metrics for the traffic it sees. It has instrumented routes for /api/chat and /api/generate and transparently passes through every other endpoint (including /v1/*), so nothing breaks by routing the API through it.
The chart defaults to a multi-arch image published on GHCR (ghcr.io/obeone/ollama-exporter), built from frcooper/ollama-exporter (Unlicense) and signed with cosign (keyless).
Because the proxy is in the API path, the chart wires the http Service port (11434) to the sidecarβs :8000 rather than straight to Ollama. All client traffic therefore flows through the proxy and gets accounted for.
You do not juggle several coupled values to turn the sidecar on or off. The chart keys everything off the exporter containerβs own enabled flag and rewrites the Service wiring for you (in templates/common.yaml, before the bjw-s loader renders):
controllers:
main:
containers:
exporter:
enabled: false # <-- the only line you touch
exporter.enabled |
http Service port β targetPort |
metrics port (:8000) |
Sidecar container |
|---|---|---|---|
true (default) |
11434 β 8000 (through proxy) |
present | present |
false |
11434 β 11434 (direct to Ollama) |
dropped | dropped |
So when you disable it, the API keeps working β traffic just goes straight to Ollama and there is no dangling :8000 Service port left pointing at a container that no longer exists. No other value needs changing.
βΉοΈ The default ships the proxy on. If you have no metrics/proxy image to run, set
exporter.enabled: falseand Ollama is served directly.
defaultPodOptions.runtimeClassName: nvidia. CPU-only clusters work with the defaults.ghcr.io/obeone/ollama-exporter (public, cosign-signed). To opt out entirely, set controllers.main.containers.exporter.enabled: false.helm install ollama obeone/ollama
The chart will:
Deployment with one replica (single RWO PVC β Recreate strategy)./root/.ollama for models and blobs.ClusterIP service on port 11434 (through the proxy by default).GPU + direct-serving example:
helm install ollama obeone/ollama \
--set defaultPodOptions.runtimeClassName=nvidia \
--set controllers.main.containers.exporter.enabled=false \
--set persistence.data.size=200Gi
helm uninstall ollama
The PVC is retained by default β delete it explicitly for a clean slate:
kubectl delete pvc -l app.kubernetes.io/name=ollama
values.yamldefaultPodOptions:
runtimeClassName: nvidia # GPU; leave "" for CPU-only
controllers:
main:
containers:
ollama:
image:
repository: ollama/ollama
tag: "0.6.0" # pin for reproducibility
env:
OLLAMA_KEEP_ALIVE: 30m
OLLAMA_CONTEXT_LENGTH: "64000"
resources:
requests:
memory: 10Gi # bump to fit your models
exporter:
enabled: true # transparent proxy + /metrics (see above)
image:
repository: ghcr.io/obeone/ollama-exporter
persistence:
data:
size: 200Gi
serviceMonitor:
metrics:
enabled: true # needs a Prometheus Operator stack
| Key | Type | Default | Description |
|---|---|---|---|
| defaultPodOptions.runtimeClassName | string | "" |
NVIDIA RuntimeClass for GPU; empty = CPU-only |
| defaultPodOptions.automountServiceAccountToken | bool | false |
Ollama does not call the Kubernetes API |
| defaultPodOptions.nodeSelector | object | {} |
Pin the workload to specific node(s), typically a GPU host |
| Key | Type | Default | Description |
|---|---|---|---|
| controllers.main.containers.ollama.image.repository | string | ollama/ollama |
Ollama server image |
| controllers.main.containers.ollama.image.tag | string | latest |
Image tag β pin to a release for reproducibility |
| controllers.main.containers.ollama.env.OLLAMA_KEEP_ALIVE | string | 30m |
How long a model stays loaded after the last call |
| controllers.main.containers.ollama.env.OLLAMA_NUM_PARALLEL | string | "2" |
Parallel requests per model |
| controllers.main.containers.ollama.env.OLLAMA_MAX_LOADED_MODELS | string | "4" |
Max models kept loaded simultaneously |
| controllers.main.containers.ollama.env.OLLAMA_CONTEXT_LENGTH | string | "64000" |
Default context window size |
| controllers.main.containers.ollama.env.OLLAMA_FLASH_ATTENTION | string | "true" |
Enable Flash Attention |
| Key | Type | Default | Description |
|---|---|---|---|
| controllers.main.containers.exporter.enabled | bool | true |
Single switch for the proxy + metrics path (see section above) |
| controllers.main.containers.exporter.image.repository | string | ghcr.io/obeone/ollama-exporter |
Proxy/exporter image (cosign-signed multi-arch build) |
| controllers.main.containers.exporter.image.tag | string | latest |
Image tag |
| controllers.main.containers.exporter.env.OLLAMA_HOST | string | http://localhost:11434 |
Upstream Ollama URL the proxy forwards to |
| Key | Type | Default | Description |
|---|---|---|---|
| service.main.type | string | ClusterIP |
Service type |
| service.main.ipFamilyPolicy | string | PreferDualStack |
Dual-stack, degrades to single-stack |
| service.main.ports.http.port | int | 11434 |
API port; targetPort is managed by exporter.enabled |
| service.main.ports.metrics.port | int | 8000 |
Exporter /metrics port (dropped when the exporter is off) |
| Key | Type | Default | Description |
|---|---|---|---|
| persistence.data.enabled | bool | true |
Model/blob store at /root/.ollama |
| persistence.data.size | string | 100Gi |
PVC size β size it for the models you intend to pull |
| persistence.data.accessMode | string | ReadWriteOnce |
PVC access mode |
| persistence.tmp.enabled | bool | true |
emptyDir scratch for OLLAMA_TMPDIR at /tmp/ollama |
| Key | Type | Default | Description |
|---|---|---|---|
| serviceMonitor.metrics.enabled | bool | false |
Prometheus Operator ServiceMonitor scraping the exporter |
| Key | Type | Default | Description |
|---|---|---|---|
| ingress.main.enabled | bool | false |
Enable ingress (routes to the proxied API port, not /metrics) |
| ingress.main.hosts | list | β¦ | Ingress hosts |
| ingress.main.tls | list | [] |
Ingress TLS |
When the exporter is enabled (default), Prometheus metrics are served on :8000/metrics. With a Prometheus Operator stack present, scrape them by enabling the bundled ServiceMonitor:
helm upgrade ollama obeone/ollama --reuse-values \
--set serviceMonitor.metrics.enabled=true
Disabling the exporter (exporter.enabled=false) removes both the sidecar and the metrics port, so leave the ServiceMonitor off in that case.
GPU exposure is effective only when defaultPodOptions.runtimeClassName points at an NVIDIA RuntimeClass. The ollama container already carries NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=compute,utility, plus OLLAMA_SCHED_SPREAD=1 to spread work across all GPUs. For Sablier scale-to-zero on a GPU group, set pod labels under controllers.main.pod.labels.
helm upgrade ollama obeone/ollama --reuse-values
kubectl logs -l app.kubernetes.io/name=ollama -c ollama
kubectl logs -l app.kubernetes.io/name=ollama -c exporter # if enabled
kubectl exec -it deploy/ollama -c ollama -- ollama pull llama3.2
kubectl run -it --rm ollama-curl --image=curlimages/curl --restart=Never -- \
-s http://ollama:11434/api/tags
The ollama Service name assumes a release name of ollama.
| Repository | Name | Version |
|---|---|---|
| https://bjw-s-labs.github.io/helm-charts | common | 5.0.1 |
| Name | Url | |
|---|---|---|
| obeone | obeone@obeone.org | Β |