charts

ollama

Ollama — run large language models locally, with an optional transparent Prometheus exporter/proxy sidecar you toggle with a single switch (exporter.enabled): on, it fronts the API and exports /metrics; off, the API is served directly. This Helm chart packages the Ollama server on the bjw-s common library so behaviour is driven almost entirely from values.yaml. Supports GPU acceleration via RuntimeClass.

TL;DR

helm repo add obeone https://charts.obeone.cloud
helm repo update
helm install ollama obeone/ollama

About

Ollama runs large language models locally (Llama, Mistral, Gemma, and friends) behind a simple HTTP API. This chart adds two opt-in extras on top of the stock server: GPU access through a RuntimeClass, and a transparent Prometheus exporter/proxy sidecar (exporter.enabled) that fronts the API and exposes /metrics without any change on the client side.

Application: ollama.com
Container image: ollama/ollama, exporter sidecar ghcr.io/obeone/ollama-exporter
Chart source: charts/ollama

Prerequisites

Helm 3
A Kubernetes cluster matching the chart’s kubeVersion constraint (see Chart.yaml)

Configuration

This chart is built on the bjw-s-labs common library. Most configuration keys (controllers, service, ingress, persistence, …) follow its schema; see the common library documentation for everything it supports beyond what is spelled out in values.yaml.

Defaults are meant to work out of the box on any cluster. The full list of options lives in values.yaml, is validated by values.schema.json at install time, and is documented in the Values section below. Override it with your own values file:

helm install ollama obeone/ollama -f my-values.yaml

GPU acceleration

Point the pod at the RuntimeClass exposing your GPUs (name depends on how the GPU operator or device plugin is installed on your cluster):

defaultPodOptions:
  runtimeClassName: nvidia

Model storage

Pulled models land on the persistent volume configured under persistence. Models are big: size it for what you actually plan to serve.

Prometheus integration

With the exporter sidecar enabled, serviceMonitor.enabled creates a Prometheus Operator ServiceMonitor scraping its /metrics endpoint.

Upgrading

helm repo update
helm upgrade ollama obeone/ollama

Each release lists its changes in the Artifact Hub changelog; give it a look before jumping across several chart versions.

Uninstalling

helm uninstall ollama

PersistentVolumeClaims created by the chart are kept around: delete them manually if you also want the data gone.

Requirements

Kubernetes: >=1.31.0-0

Repository	Name	Version
https://bjw-s-labs.github.io/helm-charts	common	5.0.1

Values

Key	Type	Default	Description
controllers.main.containers.exporter	object	{"dependsOn":"ollama","enabled":true,"env":{"OLLAMA_HOST":"http://localhost:11434"},"image":{"pullPolicy":"Always","repository":"ghcr.io/obeone/ollama-exporter","tag":"latest"},"ports":[{"containerPort":8000,"name":"proxy"}],"probes":{"liveness":{"custom":true,"enabled":true,"spec":{"failureThreshold":6,"httpGet":{"path":"/metrics","port":8000},"initialDelaySeconds":15,"periodSeconds":30,"timeoutSeconds":5}},"readiness":{"custom":true,"enabled":true,"spec":{"failureThreshold":3,"httpGet":{"path":"/api/version","port":8000},"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":5}},"startup":{"custom":true,"enabled":true,"spec":{"failureThreshold":12,"httpGet":{"path":"/metrics","port":8000},"initialDelaySeconds":5,"periodSeconds":5}}},"resources":{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}}	Prometheus exporter / transparent proxy sidecar. Listens on :8000, exposes /metrics, and forwards API requests to the Ollama server in the same pod, so the http Service port targets :8000 (see service below). Source: https://github.com/frcooper/ollama-exporter (Unlicense).
controllers.main.containers.exporter.enabled	bool	`true`	Single switch for the whole metrics/proxy path. Set to false to drop the sidecar: the chart then routes the http Service port straight to Ollama (11434) and disables the metrics port. No other value needs changing (handled in templates/common.yaml).
controllers.main.containers.exporter.env.OLLAMA_HOST	string	`"http://localhost:11434"`	Upstream Ollama URL the exporter proxies to (same pod, localhost).
controllers.main.containers.exporter.image.repository	string	`"ghcr.io/obeone/ollama-exporter"`	Exporter image. Multi-arch build published to GHCR and signed with cosign (keyless). Repoint to ghcr.io/frcooper/ollama-exporter if the upstream build workflow is merged there.
controllers.main.containers.exporter.image.tag	string	`"latest"`	Image tag. Pin to a released version once tags are published.
controllers.main.containers.ollama.env.NVIDIA_DRIVER_CAPABILITIES	string	`"compute,utility"`
controllers.main.containers.ollama.env.NVIDIA_VISIBLE_DEVICES	string	`"all"`
controllers.main.containers.ollama.env.OLLAMA_HOST	string	`"0.0.0.0"`	Bind on all interfaces so the Service and the exporter sidecar can reach Ollama. Never set this to 127.0.0.1 in-cluster.
controllers.main.containers.ollama.env.OLLAMA_NOHISTORY	string	`"true"`	Disable prompt history persistence.
controllers.main.containers.ollama.env.OLLAMA_ORIGINS	string	`"*"`	Allowed CORS origins.
controllers.main.containers.ollama.env.OLLAMA_TMPDIR	string	`"/tmp/ollama"`	Scratch directory (mounted as an emptyDir below).
controllers.main.containers.ollama.image.pullPolicy	string	`"Always"`
controllers.main.containers.ollama.image.repository	string	`"ollama/ollama"`	Ollama server image.
controllers.main.containers.ollama.image.tag	string	`"latest"`	Image tag. Pin to a release (e.g. “0.6.0”) for reproducibility.
controllers.main.containers.ollama.probes.liveness.custom	bool	`true`
controllers.main.containers.ollama.probes.liveness.enabled	bool	`true`
controllers.main.containers.ollama.probes.liveness.spec.failureThreshold	int	`6`
controllers.main.containers.ollama.probes.liveness.spec.httpGet.path	string	`"/"`
controllers.main.containers.ollama.probes.liveness.spec.httpGet.port	int	`11434`
controllers.main.containers.ollama.probes.liveness.spec.initialDelaySeconds	int	`30`
controllers.main.containers.ollama.probes.liveness.spec.periodSeconds	int	`15`
controllers.main.containers.ollama.probes.liveness.spec.timeoutSeconds	int	`5`
controllers.main.containers.ollama.probes.readiness.custom	bool	`true`
controllers.main.containers.ollama.probes.readiness.enabled	bool	`true`
controllers.main.containers.ollama.probes.readiness.spec.failureThreshold	int	`6`
controllers.main.containers.ollama.probes.readiness.spec.httpGet.path	string	`"/"`
controllers.main.containers.ollama.probes.readiness.spec.httpGet.port	int	`11434`
controllers.main.containers.ollama.probes.readiness.spec.initialDelaySeconds	int	`10`
controllers.main.containers.ollama.probes.readiness.spec.periodSeconds	int	`10`
controllers.main.containers.ollama.probes.readiness.spec.timeoutSeconds	int	`5`
controllers.main.containers.ollama.probes.startup.enabled	bool	`false`
controllers.main.containers.ollama.resources.requests.cpu	string	`"100m"`
controllers.main.containers.ollama.resources.requests.memory	string	`"2Gi"`	Bump to match the models you intend to serve (e.g. 10Gi).
controllers.main.pod.labels	object	`{}`	Extra pod labels. Example for Sablier scale-to-zero on a GPU group:
controllers.main.strategy	string	`"Recreate"`
controllers.main.type	string	`"deployment"`
defaultPodOptions	object	`{"automountServiceAccountToken":false,"nodeSelector":{},"runtimeClassName":""}`	Options applied to every pod created by this chart.
defaultPodOptions.nodeSelector	object	`{}`	Pin the workload to specific node(s), typically a GPU host.
defaultPodOptions.runtimeClassName	string	`""`	RuntimeClass exposing the GPU to the pod (e.g. “nvidia”). Leave empty for CPU-only clusters. When set, the NVIDIA_* env vars on the ollama container enable GPU acceleration.
ingress.main.annotations	object	`{}`	Ingress annotations. Example Traefik whitelist middleware:
ingress.main.className	string	`""`
ingress.main.enabled	bool	`false`
ingress.main.hosts[0].host	string	`"chart-example.local"`
ingress.main.hosts[0].paths[0].path	string	`"/"`
ingress.main.hosts[0].paths[0].pathType	string	`"Prefix"`
ingress.main.hosts[0].paths[0].service.identifier	string	`"main"`
ingress.main.hosts[0].paths[0].service.port	string	`"http"`
ingress.main.tls	list	`[]`
persistence.data.accessMode	string	`"ReadWriteOnce"`
persistence.data.advancedMounts	object	`{"main":{"ollama":[{"path":"/root/.ollama"}]}}`	Reuse a pre-created PVC instead of provisioning a new one. existingClaim: ollama
persistence.data.enabled	bool	`true`
persistence.data.size	string	`"100Gi"`
persistence.data.type	string	`"persistentVolumeClaim"`
persistence.tmp.advancedMounts.main.ollama[0].path	string	`"/tmp/ollama"`
persistence.tmp.enabled	bool	`true`
persistence.tmp.type	string	`"emptyDir"`
route	object	`{"main":{"enabled":false,"hostnames":["chart-example.local"],"kind":"HTTPRoute","parentRefs":[{"name":"gateway","namespace":"gateway-system"}],"rules":[{"backendRefs":[{"identifier":"main","port":"http"}],"matches":[{"path":{"type":"PathPrefix","value":"/"}}]}]}}`	Gateway API HTTPRoute, mirroring the Ingress above. Disabled by default: pick either Ingress or HTTPRoute, not both. Requires the Gateway API CRDs and an existing Gateway in the cluster.
route.main.enabled	bool	`false`	Enable the HTTPRoute. Mutually exclusive with `ingress.main.enabled`.
route.main.hostnames	list	`["chart-example.local"]`	Hostnames served by this route.
route.main.kind	string	`"HTTPRoute"`	Route kind. HTTPRoute, GRPCRoute, TCPRoute, TLSRoute or UDPRoute.
route.main.parentRefs	list	`[{"name":"gateway","namespace":"gateway-system"}]`	Gateways this route attaches to.
route.main.rules	list	`[{"backendRefs":[{"identifier":"main","port":"http"}],"matches":[{"path":{"type":"PathPrefix","value":"/"}}]}]`	Routing rules. `identifier` refers to a Service defined above.
service.main.controller	string	`"main"`
service.main.ports.http.port	int	`11434`
service.main.ports.http.primary	bool	`true`
service.main.ports.http.protocol	string	`"HTTP"`
service.main.ports.http.targetPort	string	`"proxy"`
service.main.type	string	`"ClusterIP"`
serviceMonitor	object	`{"metrics":{"enabled":false,"endpoints":[{"interval":"30s","path":"/metrics","port":"http","scrapeTimeout":"10s"}],"serviceName":""}}`	Prometheus Operator ServiceMonitor scraping the exporter’s /metrics. Disabled by default; enable it (or set it in your override) when a Prometheus Operator stack is present.

Verifying the chart signature

Charts in this repository are signed with GPG and every release ships a provenance file. The public key is available at charts.obeone.cloud/public_key.gpg, fingerprint B9FE852F28888D27F8C9A11CD33E04CD22E335CE.

# Import the signing key into a legacy keyring (helm verifies with GnuPG v1 keyrings)
curl -fsSL https://charts.obeone.cloud/public_key.gpg | gpg --import
gpg --export > ~/.gnupg/pubring.gpg

# Pull the chart and check it against its provenance file
helm pull --verify obeone/ollama

Support

This is a personal chart repository, maintained on a best-effort basis. Bug reports and contributions are welcome on GitHub.

Autogenerated from chart metadata using helm-docs v1.14.2