Running VMs with GPU Passthrough

This section demonstrates how to deploy virtual machines (VMs) with GPU passthrough using Cozystack. First, we’ll deploy the GPU Operator to configure the worker node for GPU passthrough Then we will deploy a KubeVirt VM that requests a GPU.

By default, to provision a GPU Passthrough, the GPU Operator will deploy the following components:

VFIO Manager to bind vfio-pci driver to all GPUs on the node.
Sandbox Device Plugin to discover and advertise the passthrough GPUs to kubelet.
Sandbox Validator to validate the other operands.

Prerequisites

A Cozystack cluster with at least one GPU-enabled node.
kubectl installed and cluster access credentials configured.

1. Install the GPU Operator

Follow these steps:

Label the worker node explicitly for GPU passthrough workloads:

kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-passthrough

Enable the GPU Operator bundle in your Cozystack configuration:
```
kubectl edit -n cozy-system configmap cozystack
```
Add gpu-operator to the list of bundle-enabled packages:
```
bundle-enable: gpu-operator
```
This will deploy the components (operands).

Ensure all pods are in a running state and all validations succeed with the sandbox-validator component:

kubectl get pods -n cozy-gpu-operator

Example output (your pod names may vary):

NAME                                            READY   STATUS    RESTARTS   AGE
...
nvidia-sandbox-device-plugin-daemonset-4mxsc    1/1     Running   0          40s
nvidia-sandbox-validator-vxj7t                  1/1     Running   0          40s
nvidia-vfio-manager-thfwf                       1/1     Running   0          78s

To verify the GPU binding, access the node using kubectl debug node or kubectl node-shell -x and run:

lspci -nnk -d 10de:

The vfio-manager pod will bind all GPUs on the node to the vfio-pci driver. Example output:

3b:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
       Subsystem: NVIDIA Corporation Device [10de:1482]
       Kernel driver in use: vfio-pci
86:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
       Subsystem: NVIDIA Corporation Device [10de:1482]
       Kernel driver in use: vfio-pci

The sandbox-device-plugin will discover and advertise these resources to kubelet. In this example, the node shows two A10 GPUs as available resources:

kubectl describe node <node-name>

Example output:

...
Capacity:
  ...
  nvidia.com/GA102GL_A10:         2
  ...
Allocatable:
  ...
  nvidia.com/GA102GL_A10:         2
...

Note: Resource names are constructed by combining the device and device_name columns from the PCI IDs database. For example, the database entry for A10 reads 2236 GA102GL [A10], which results in a resource name nvidia.com/GA102GL_A10.

2. Update the KubeVirt Custom Resource

Next, we will update the KubeVirt Custom Resource, as documented in the KubeVirt user guide, so that the passthrough GPUs are permitted and can be requested by a KubeVirt VM.

Adjust the pciVendorSelector and resourceName values to match your specific GPU model. Setting externalResourceProvider=true indicates that this resource is provided by an external device plugin, in this case the sandbox-device-plugin which is deployed by the Operator.

kubectl edit kubevirt -n cozy-kubevirt

example config:

  ...
  spec:
    configuration:
      permittedHostDevices:
        pciHostDevices:
        - externalResourceProvider: true
          pciVendorSelector: 10DE:2236
          resourceName: nvidia.com/GA102GL_A10
  ...

3. Create a Virtual Machine

We are now ready to create a VM.

Create a sample virtual machine using the following VMI specification that requests the nvidia.com/GA102GL_A10 resource.

vmi-gpu.yaml:

---
apiVersion: apps.cozystack.io/v1alpha1
appVersion: '*'
kind: VirtualMachine
metadata:
  name: gpu
  namespace: tenant-example
spec:
  running: true
  instanceProfile: ubuntu
  instanceType: u1.medium
  systemDisk:
    image: ubuntu
    storage: 5Gi
    storageClass: replicated
  gpus:
  - name: nvidia.com/GA102GL_A10
  cloudInit: |
    #cloud-config
    password: ubuntu
    chpasswd: { expire: False }

kubectl apply -f vmi-gpu.yaml

Example output:

virtualmachines.apps.cozystack.io/gpu created

Verify the VM status:

kubectl get vmi

NAME                       AGE   PHASE     IP             NODENAME        READY
virtual-machine-gpu        73m   Running   10.244.3.191   luc-csxhk-002   True

virtctl console virtual-machine-gpu

Example output:

Successfully connected to vmi-gpu console. The escape sequence is ^]

vmi-gpu login: ubuntu
Password:

ubuntu@virtual-machine-gpu:~$ lspci -nnk -d 10de:
08:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:26b9] (rev a1)
        Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1851]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nvidia_drm, nvidia

Last modified 2025-07-23: [docs] Move virtualization and networking guides to the first level (eaf3eaf)

CozySummit Virtual 2025 · December 3 · Register Now

Running VMs with GPU Passthrough

Prerequisites

1. Install the GPU Operator

2. Update the KubeVirt Custom Resource

3. Create a Virtual Machine