Systems

A beginner's guide to GPU virtualization: passthrough, vGPU, and MIG

What every IT generalist needs to know before deploying GPU workloads, and why the platform matters more than the hardware.

Published

PARTNER CONTENT GPU workloads are no longer the exclusive territory of research labs and hyperscalers. Engineering teams, data science groups, healthcare organizations, and financial services businesses are all deploying GPU-accelerated infrastructure for AI inference, simulation, visualization, and virtual desktops. For many IT teams, this is new ground. The hardware is familiar, because NVIDIA GPUs fit in standard server slots. But the software isn't.

GPU virtualization has three distinct models. Each makes different tradeoffs between performance, sharing efficiency, and isolation. Understanding which model fits which workload is the first step. Understanding how to operate them is where most deployments run into trouble.

PCIe passthrough: one GPU, one VM

PCIe passthrough is the simplest GPU virtualization model to understand and the hardest to scale. The hypervisor assigns an entire physical GPU to a single virtual machine. The VM communicates directly with the hardware with no abstraction layer and no sharing. From the VM's perspective, it owns a physical GPU.

Passthrough delivers maximum performance. It is the right choice when a single workload must have the full card: Large model training runs, high-fidelity physics simulations, or rendering pipelines that saturate GPU memory. Applications that require bare-metal GPU behavior and cannot tolerate any virtualization overhead run cleanly in the passthrough model.

The tradeoffs are significant. One physical GPU per VM means utilization collapses the moment that workload finishes. Most platforms don't support live migration of a passthrough VM. Instead, you must stop it first. At scale, passthrough turns expensive GPU hardware into a rigid single-tenant resource with no flexibility for sharing or density.

NVIDIA vGPU: one GPU, multiple VMs

NVIDIA vGPU is a full GPU virtualization software stack. It divides a single physical GPU into virtual GPU instances; each VM gets its own vGPU with dedicated memory and a full NVIDIA driver running inside the guest OS. From the application's perspective, the vGPU looks and behaves like a discrete GPU. Software that requires a certified NVIDIA driver runs without modification.

vGPU is the right model for workloads that need GPU access but don't need an entire card. Virtual desktop infrastructure (VDI) environments where knowledge workers need graphics-capable virtual desktops, development environments where multiple engineers share a server, and inference endpoints where multiple services share GPU memory all fit the vGPU model. It delivers density without sacrificing driver compatibility.

vGPU requires a software license from NVIDIA in addition to the hardware cost, and those licenses are recurring. Most platforms define vGPU profiles (the memory and compute allocation for each VM) at creation time. Changing them requires rebuilding the VM.

MIG: hardware-enforced GPU partitioning

Multi-Instance GPU (MIG) is a hardware capability introduced with NVIDIA's Ampere architecture and extended in Hopper and Blackwell. Where vGPU shares GPU resources in software, MIG partitions the physical GPU in silicon. Each MIG instance gets its own dedicated compute engine, memory controller, and memory bandwidth. Hardware enforces the isolation rather than the driver or the hypervisor.

MIG is the right model when workloads need predictable, guaranteed performance and true fault isolation. If one MIG instance encounters an error or runs a noisy workload, it cannot affect neighboring instances. This distinction matters in multi-tenant environments and in regulated industries where workload isolation is a compliance requirement, not just a preference. Private AI inference, where sensitive data must stay inside organizational boundaries and multiple models run simultaneously, is a primary MIG use case.

MIG instances come in fixed sizes defined by the GPU architecture. You cannot create arbitrary partition geometries. MIG is available on datacenter-class GPUs: NVIDIA's A100, H100, and the RTX PRO 6000 Blackwell Server Edition, among others.

The implementation problem: everything requires the command line

Understanding the three models is straightforward, but deploying and operating them isn't.

GPU virtualization on most hypervisors requires command-line expertise that extends well beyond the typical sysadmin skill set. PCIe passthrough requires IOMMU configuration and device binding at the host level. Mistakes at this stage can cause VM instability or host crashes. vGPU requires the NVIDIA vGPU Manager software installed and running on the hypervisor host, with driver versions precisely matched across the hypervisor, the vGPU software stack, and each guest OS. Version mismatches across those three layers are a leading cause of GPU support tickets. A guest OS update that pulls in a new NVIDIA driver can break vGPU functionality across an entire host.

MIG configuration is where CLI complexity peaks. Configuring MIG on an NVIDIA datacenter GPU requires direct use of nvidia-smi: Selecting a MIG profile, enabling MIG mode, creating GPU instances, creating compute instances within each GPU instance, and then assigning those instances to VMs. Reconfiguring MIG when a workload's resource needs change requires destroying the existing instances and recreating them. Every step happens at the command line, on the host. Getting it wrong typically requires rebooting the GPU or the host.

The consequence is predictable: Organizations that could benefit from MIG's isolation and density advantages avoid it entirely because they don't have a GPU specialist on staff. The capability exists in the hardware, but the operational path to using it stops most teams cold.

Ongoing management compounds the initial deployment challenge. Workload requirements change, more VMs need GPU access, inference models grow in size, and new teams need isolated development environments. On most platforms, reconfiguring GPU allocations to accommodate all this means returning to the command line. There is no unified view of GPU utilization alongside compute and memory in the same dashboard. GPU management is a separate discipline from infrastructure management.

Use cases by GPU virtualization type

PCIe passthrough suits workloads that need the entire GPU and don't need to share it, such as LLM training runs, fluid dynamics simulation, computational chemistry, and high-resolution rendering. These jobs typically run to completion and release the GPU. The low utilization window between jobs is the cost of the passthrough model.

NVIDIA vGPU suits density-focused deployments. These include VDI environments where engineering, design, and scientific visualization teams need GPU-accelerated desktops from centralized infrastructure, AI development environments where multiple developers need GPU access simultaneously, and inference endpoints where multiple models share a high-memory GPU. NVIDIA RTX Virtual Workstation (vWS) supports professional visualization workloads, Virtual PC (vPC) supports knowledge worker desktops, and Virtual Applications (vApps) delivers individual GPU-accelerated applications without full virtual desktop overhead.

MIG suits multi-tenant scenarios where isolation is non-negotiable: Private AI inference where each model runs in a guaranteed, isolated environment; regulated industries where healthcare imaging, financial modeling, or defense workloads require hardware-level separation; and research organizations where multiple teams share expensive GPU infrastructure without contention.

How VergeOS changes the operational equation

VergeOS supports all three GPU virtualization models (PCIe passthrough, NVIDIA vGPU, and MIG) as native capabilities within a single platform. NVIDIA introduced VergeOS as a validated vGPU platform, with joint support across RTX Virtual Workstation (vWS), Virtual PC (vPC), and Virtual Applications (vApps). Validated hardware includes the A100, A30, A40, and L40 series, plus the RTX PRO 6000 Blackwell Server Edition for MIG vGPU.

The operational difference is where VergeOS diverges from the typical deployment story. GPU configuration including MIG is point-and-click in the same interface used for compute, storage, and networking. No nvidia-smi. No command-line steps. Selecting a MIG profile, assigning GPU resources to a VM, and even reconfiguring when workload requirements change all happen through the same interface an IT generalist already operates.

Driver management follows the same principle. Upload a driver once. VergeOS builds the ISO and automatically deploys it to every GPU-enabled VM at assignment. VergeOS replaces the three-layer version management problem covering hypervisor, vGPU software stack, and guest OS with a single upload and automatic distribution.

GPU utilization appears alongside CPU and memory in the same monitoring dashboard. There is no separate GPU management plane, no additional tool to learn, and no specialist required to stand up and operate the platform.

Organizations with existing VergeOS deployments add GPU capabilities by installing supported NVIDIA hardware in their cluster nodes. VergeOS detects the hardware automatically. The same platform that manages your storage and networking manages your GPU infrastructure without a learning curve that most IT teams can't afford.

Watch the GPU Virtualization Without the Complexity on-demand webinar for a live demonstration of all three GPU modes in the VergeOS interface. Download the GPU Virtualization Without the Complexity white paper for a full technical breakdown of GPU modes, driver management, and deployment scenarios.

Contributed by VergeIO.