Serious performance impact on high IO VM with multiple snapshots

Recently I ran in to a situation where a customer suffered severe performance issues on a virtualized SQL server. In the SQL server we noticed a high CPU utilization, but the underlying ESX hosts only showed relatively low CPU utilization for this VM.

Debugging the VM performance issue with esxtop showed very high co-stop (%CTSP) vallues.

According to the vSphere Monitoring and Performance guide, %CTSP is

Percentage of time a resource pool spends in a ready, co-deschedule state.
NOTE You might see this statistic displayed, but it is intended for VMware use only.

Funny how VMware expresses this metric is only to be used by VMware 🙂 Continue reading