Running VMs with GPU Passthrough
This section demonstrates how to deploy virtual machines (VMs) with GPU passthrough using Cozystack. First, we’ll deploy the GPU Operator to configure the worker node for GPU passthrough Then we will deploy a KubeVirt VM that requests a GPU.
By default, to provision a GPU Passthrough, the GPU Operator will deploy the following components:
- VFIO Manager to bind
vfio-pcidriver to all GPUs on the node. - Sandbox Device Plugin to discover and advertise the passthrough GPUs to kubelet.
- Sandbox Validator to validate the other operands.
Prerequisites
- A Cozystack cluster with at least one GPU-enabled node.
- kubectl installed and cluster access credentials configured.
1. Install the GPU Operator
Follow these steps:
Label the worker node explicitly for GPU passthrough workloads:
kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-passthroughEnable the GPU Operator bundle in your Cozystack configuration:
kubectl edit -n cozy-system configmap cozystackAdd
gpu-operatorto the list of bundle-enabled packages:bundle-enable: gpu-operatorThis will deploy the components (operands).
Ensure all pods are in a running state and all validations succeed with the sandbox-validator component:
kubectl get pods -n cozy-gpu-operatorExample output (your pod names may vary):
NAME READY STATUS RESTARTS AGE ... nvidia-sandbox-device-plugin-daemonset-4mxsc 1/1 Running 0 40s nvidia-sandbox-validator-vxj7t 1/1 Running 0 40s nvidia-vfio-manager-thfwf 1/1 Running 0 78s
To verify the GPU binding, access the node using kubectl debug node or kubectl node-shell -x and run:
lspci -nnk -d 10de:
The vfio-manager pod will bind all GPUs on the node to the vfio-pci driver. Example output:
3b:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:1482]
Kernel driver in use: vfio-pci
86:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:1482]
Kernel driver in use: vfio-pci
The sandbox-device-plugin will discover and advertise these resources to kubelet. In this example, the node shows two A10 GPUs as available resources:
kubectl describe node <node-name>
Example output:
...
Capacity:
...
nvidia.com/GA102GL_A10: 2
...
Allocatable:
...
nvidia.com/GA102GL_A10: 2
...
device and device_name columns from the
PCI IDs database.
For example, the database entry for A10 reads 2236 GA102GL [A10], which results in a resource name nvidia.com/GA102GL_A10.2. Update the KubeVirt Custom Resource
Next, we will update the KubeVirt Custom Resource, as documented in the KubeVirt user guide, so that the passthrough GPUs are permitted and can be requested by a KubeVirt VM.
Adjust the pciVendorSelector and resourceName values to match your specific GPU model.
Setting externalResourceProvider=true indicates that this resource is provided by an external device plugin,
in this case the sandbox-device-plugin which is deployed by the Operator.
kubectl edit kubevirt -n cozy-kubevirt
example config:
...
spec:
configuration:
permittedHostDevices:
pciHostDevices:
- externalResourceProvider: true
pciVendorSelector: 10DE:2236
resourceName: nvidia.com/GA102GL_A10
...
3. Create a Virtual Machine
We are now ready to create a VM.
Create a sample virtual machine using the following VMI specification that requests the
nvidia.com/GA102GL_A10resource.vmi-gpu.yaml:
--- apiVersion: apps.cozystack.io/v1alpha1 appVersion: '*' kind: VirtualMachine metadata: name: gpu namespace: tenant-example spec: running: true instanceProfile: ubuntu instanceType: u1.medium systemDisk: image: ubuntu storage: 5Gi storageClass: replicated gpus: - name: nvidia.com/GA102GL_A10 cloudInit: | #cloud-config password: ubuntu chpasswd: { expire: False }kubectl apply -f vmi-gpu.yamlExample output:
virtualmachines.apps.cozystack.io/gpu createdVerify the VM status:
kubectl get vmiNAME AGE PHASE IP NODENAME READY virtual-machine-gpu 73m Running 10.244.3.191 luc-csxhk-002 TrueLog in to the VM and confirm that it has access to GPU:
virtctl console virtual-machine-gpuExample output:
Successfully connected to vmi-gpu console. The escape sequence is ^] vmi-gpu login: ubuntu Password: ubuntu@virtual-machine-gpu:~$ lspci -nnk -d 10de: 08:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:26b9] (rev a1) Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1851] Kernel driver in use: nvidia Kernel modules: nvidiafb, nvidia_drm, nvidia