In this paper, VMware gives AI/ML training workload performance test results for the VMware vSphere virtualization platform using multiple NVIDIA A100-80GB GPUs with NVIDIA NVLink; the results fall into the “Goldilocks Zone,” which refers to the area of good performance with virtualization benefits.
Their results show that several virtualized MLPerf Training1 v3.0 benchmarks’ training times perform within 1.06% to 1.08% of the same workloads run on a comparable bare metal system. Note that lower is better.
In addition, we show the MLPerf Inference v3.0 test results for the vSphere virtualization platform with NVIDIA H100 and A100 Tensor Core GPUs. Our tests show that when NVIDIA vGPUs are used in vSphere, the workload performance measured as queries served per second (qps) is 94% to 105% of the performance on the bare metal system. Note that higher is better.