Last week, I’ve visited a client to troubleshoot some performance issues. Besides the fact that they handed out too much virtual CPU’s to their virtual machines, I noticed one VM with a much higher %CSTP value than comparable virtual machines.
You can see in the image that the number of worlds per VM is 12; this means that the VM’s are configured with 8 vCPU’s, which is a lot, especially when they aren’t using them at all.
The red virtual machine has a %CSTP value of 19.59%. This means that one fifth of the time this VM is waiting for something.
I noticed the red VM had a snapshot of 30 Gigabytes. After committing (deleting) the snapshot and lowering the number of VCPU’s the virtual machine was running fine again.
This behaviour is also described in the knowledge base article “High co-stop values seen during virtual machine snapshot activities”. Duco Jaspars has also written an article about it.
Virtual machine performance can be adversely affected during snapshot operations for a number of reasons, both due to their nature and due to environmental issues.
Snapshots introduce complexity to storage I/O. Due to the nature of snapshots, every read operation must traverse every snapshot disk and then the base disk in order to verify the appropriate disk block to return.
Because these extended read operations are required, snapshots are the most performance-intensive disk format for virtual disks (as opposed to thin-provisioned, thick-provisioned, or eager-zeroed thick-provisioned virtual disks).
As storage I/O for snapshots grows, co-stop (%CSTP) values for a VM with multiple vCPUs can increase as the vCPUs wait on I/O completion.
To reduce the high %CSTP values and increase virtual machine performance, consolidate any snapshots into the main virtual disk.
After consolidation, the %CSTP value is reduced or eliminated and VM performance is improved.