Preface

In the guide, I will target utilizing consumer-grade graphics such as the GTX and RTX series of graphic cards from NVIDIA for container workloads on Kubernetes. If you have read through my   previous   posts , I am migrating the services I used to host on plain old docker-compose to Kubernetes. One such workload was Jellyfin, which was hobbling when using its transcoding feature on the puny Intel Iris Plus 655 integrated graphics. When I bought new hardware for the new cluster, I picked up an NVIDIA T400 to go along with it. Although not squarely consumer-grade, it is very similar to one, as in, there is no support for it in ESXi for GPU virtualization or anything special like that.

Installation process

I started by installing a Photon OS node for utilizing the graphics card, passed the card through from ESXi, and joined it to the k3s cluster. Now comes the tricky parts. The overall steps include installing the drivers on the VM, getting the container runtime set up, using the runtime for workloads, and finally telling Kubernetes that it has a resource for GPU using the device plugin. Let’s get started on that, shall we!

1. Install the NVIDIA drivers

Most of the steps needed are present in the issue vmware/photon#1291 I opened for troubleshooting this on Photon OS. Other operating systems will utilize a similar method, and there are many guides out there for the common distros, unlike Photon OS. I’ll summarize the TL;DR here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Get Kernel sources
❯ tdnf install linux-esx-devel
❯ reboot
# Get build tools and other needed packages
❯ tdnf install build-essential tar wget
# Download the right driver for your card from https://www.nvidia.com/download/index.aspx
❯ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
# Unmount /tmp to not use tmpfs which runs out of space during driver install
❯ umount /tmp
# Run the installer. Select OK to guess X library path (we don't have X)
# OK to no 32-bit install and No to nvidia-xconfig.
# If asked for DKMS, answer No until the issue is resolved
# https://github.com/vmware/photon/issues/1287
❯ sh NVIDIA-Linux-x86_64-510.54.run
# Reboot to pick up any necessary changes and revert the /tmp to tmpfs
❯ reboot
# Check if the graphics card was detected, sample output below
❯ nvidia-smi

Thu Feb 17 21:55:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T400         Off  | 00000000:0B:00.0 Off |                  N/A |
| 34%   44C    P0    N/A /  31W |      0MiB /  2048MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

2. Install NVIDIA container toolkit

K3s uses containerd as the container runtime. We need to follow the steps by NVIDIA and K3s configuration for containerd by Rancher . We will use the CentOS version since Photon OS uses the tiny version of YUM for package management.

1
2
3
4
❯ curl -s -L https://nvidia.github.io/nvidia-docker/centos8/nvidia-docker.repo | tee /etc/yum.repos.d/nvidia-docker.repo
# Note that the package is nvidia-container-toolkit, which replaces the nvidia-container-runtime.
# Another point of confusion right here.
❯ tdnf install nvidia-container-toolkit

Installing the above package “should” also configure containerd. Ensure that the configuration file has the below config as shown:

1
2
3
4
5
6
7
# /var/lib/rancher/k3s/agent/etc/containerd/config.toml 
...

[plugins.cri.containerd.runtimes."nvidia"]
    runtime_type = "io.containerd.runc.v2"
[plugins.cri.containerd.runtimes."nvidia".options]
    BinaryName = "/usr/bin/nvidia-container-runtime"

3. Create the RuntimeClass

Now that we have a runtime that supports NVIDIA, we need to inform Kubernetes (k3s) about the same. There is more information in the issue k3s-io/k3s#4070 . I ended up getting most of the needed information from there. Apply the below YAML file using the below command.

1
2
3
4
5
6
7
# nvidia.runtimeclass.yaml

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
    name: nvidia
handler: nvidia
1
kubectl apply -f nvidia.runtimeclass.yaml

4. Label nodes containing GPUs

Let us label the nodes that can host a GPU to target workloads requiring GPUs to those nodes. My node with the card is named kratos-worker-2.

1
kubectl label nodes kratos-worker-2 accelerator=nvidia

5. Install NVIDIA device plugin for Kubernetes

Kubernetes provides resources to pods. It bundles the cpu and memory type resources. To add a GPU to be a resource, we need to inform about it using a device plugin. This plugin checks the nodes and updates the same to the kube-api server so that it can give it away when requested by workloads. NVIDIA provides an official device plugin as an open-source project. We can install the latest version (as of writing this) using the below command.

1
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.10.0/nvidia-device-plugin.yml

6. Patch the device plugin

We need to modify the device plugin to use the nvidia runtime and only run on nodes with the nvidia runtime. Otherwise, the pods get stuck in ContainerCreating.

1
kubectl -n kube-system patch daemonset nvidia-device-plugin-daemonset --patch '{"spec": {"template": {"spec": {"runtimeClassName": "nvidia", "nodeSelector": {"accelerator": "nvidia"}}}}}'

7. Testing

Here is a sample testing pod. It should print the graphics card details and exit.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# nvidia-smi.pod.yaml
apiVersion: v1
kind: Pod
metadata:
    name: nvidia-smi
spec:
    runtimeClassName: nvidia
    restartPolicy: Never
    containers:
        - image: nvidia/cuda:11.0-base
        name: cuda
        command:
            - "nvidia-smi"
        resources:
            limits:
                nvidia.com/gpu: 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
❯ kubectl apply -f nvidia-smi.pod.yaml
pod/nvidia-smi created
❯ kubectl get pods
NAME         READY   STATUS      RESTARTS   AGE
nvidia-smi   0/1     Completed   0          3s
❯ kubectl logs nvidia-smi
Thu Mar 17 09:19:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T400         Off  | 00000000:0B:00.0 Off |                  N/A |
| 38%   37C    P8    N/A /  31W |      0MiB /  2048MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                            
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

8. Run GPU-based workloads

To run GPU workloads, remember to use the following in the pod spec. Here is a sample Jellyfin deployment that I use.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
spec:
    runtimeClassName: nvidia          # Specify nvidia as the runtimeClass
    containers:
      - name: graphicWorkload
        image: nvidiaWorkloadImage    # NVIDIA compatible image
        env:                          # This is "supposed" to be injected by the device plugin
          - name: NVIDIA_DRIVER_CAPABILITIES
            value: all
        resources:
        limits:
            nvidia.com/gpu: 1         # Request a GPU

Conclusion

Without using DKMS in step 1 while installing the drivers, an upgrade of the Linux kernel necessitates a reinstall of the driver.

The whole process of getting this set up took me a better part of a week. I’m pretty sure I’ll be using this as a guide myself on my next install of a K3s cluster with NVIDIA graphics. I hope this helps you too!