Getting CPU Ready Time Stats With VMware PowerCLI

Introduction

For those of us involved in the virtualization world, specially with VMware technologies, CPU Ready Time is one of those metrics that are extremely important but also hard to understand for administrators. CPU Ready Time is the percentage of time during which a virtual CPU has to wait to access the physical CPU during any given time interval.

In ESXi the CPU scheduler is in charge of managing CPU resources, it determines which vCPU gets access to which pCPU (physical CPU) and when. If there are too many virtual machines running on a single host, the CPU scheduler might struggle to assign physical resources and that is when the CPU Ready Time percentage will start to increase and reach levels that are higher than recommended, which according to most official sources should be 5% or less.

The Get-Stat PowerCLI Cmdlet

Included with the VMware PowerCLI installation, the Get-Stat cmdlet is an extremely powerful tool that is able to pull performance data from the vCenter database, including memory, CPU, disk and network metrics. With Get-Stat we can gather information about plenty of performance counters. I suggest you read the online help page for Get-Stat, we will be working with most of its parameters so a good understanding of what each one does would be ideal before we dive into the code examples. For now, let's focus on the following:

Cpu: gets common CPU performance metrics, 'cpu.usage.average' and 'cpu.usagemhz.average'. It does not include CPU Ready Time.
Stat: all stats have a unique identifier, less common metrics can be obtained by specifying the desired stat identifier. The following is from the Get-Stat help page: "Counters are provided using a dotted notation of the form "counter group"."counter name"."rollup type". For example, "cpu.usage.min".".
Start: start of the desired output time range.
Finish: end of the desired output time range.
Entity: the inventory object for which we want to pull stats. VM , Host, etc.
IntervalSecs / IntervalMins: interval in seconds or minutes. For example, -IntervalMins 30 would return the average of the specified performance metric(s) every 30 minutes. I.e. in the output every object's value is the average of the previous 30 minutes, based on the metric's timestamp.

Get-StatType

There are two ways to tell Get-Stat which metric we want: the Cpu parameter (or its counterparts, Network, Memory, Disk) and the Stat parameter. The latter needs an identifier -as we have recently established- which VMware, has in all likelihood, documented and published somewhere. However, there is an easy way to find the identifier that may work for us: Get-StatType.

Get-StatType returns the type of identifiers we need. Let's try to find the one for CPU Ready Time. We will use the Name and Entity parameters.

PS C:\> Get-StatType -Name *cpu*ready* -Entity MyVM
cpu.ready.summation

In the example above we use wildcards to find a stat with the words "cpu" and "ready" in its name. The entity parameter is mandatory, any random VM that is powered on will work. As for the output, it looks like we are getting closer to what we need, "cpu.ready.summation" is not exactly the same as CPU Ready Time but it's similar and worth looking into.

It is probably a good idea to start by looking at the values of the CPU Ready Summation metric. For brevity I am only including the first few lines of the output.

PS C:\> Get-Stat -Stat cpu.ready.summation -Entity MyVM -IntervalMins 30

MetricId            Timestamp              Value Unit        Instance
--------            ---------              ----- ----        --------
cpu.ready.summation 2/7/2021 3:30:00 PM    44736 millisecond         
cpu.ready.summation 2/7/2021 3:00:00 PM    43516 millisecond         
cpu.ready.summation 2/7/2021 2:30:00 PM    44350 millisecond         
cpu.ready.summation 2/7/2021 2:00:00 PM    40496 millisecond         
cpu.ready.summation 2/7/2021 1:30:00 PM    45474 millisecond         
cpu.ready.summation 2/7/2021 1:00:00 PM    48212 millisecond

The output above shows that we are working with values measured in milliseconds, something to keep in mind. Not less important is the fact that each CPU will have its own metric, so if a VM has two CPUs it will have two metric values for each timestamp. VMware refers to each CPU (as well as to any single unit or component that generates metrics) as an instance.

Formula

Let's do a quick Google search of "cpu ready summation". As of the writing of this post, the first result is VMware's KBA 2002181, "Converting between CPU summation and CPU % ready values". This seems to be exactly what we need, we have CPU summation and want CPU ready.

The KBA describes the formula that we need:

To calculate the CPU ready % from the CPU ready summation value, use this formula:
(CPU summation value / (<chart default update interval in seconds> * 1000)) * 100 = CPU ready %
For example:

The Realtime stats for a virtual machine in vCenter might have an average CPU ready summation value of 1000. Use the appropriate values with the formula to get the CPU ready %.
(1000 / (20s * 1000)) * 100 = 5% CPU ready
https://kb.vmware.com/s/article/2002181

Remember that CPU Ready Summation returns values in milliseconds, the formula converts milliseconds into a percentage. Let's see the following script.

# User Variables #############################################################
$StartDate = [datetime]'02/02/2021'
$EndDate = [datetime]'02/07/2021'
$IntervalMins = 30
$Entity = Get-VM -Name MyVM
$OutputPath = 'C:\CPU_Ready_Time_Report2.csv'
##############################################################################

#Convert interval from minutes to seconds, a requirement for the formula
$IntervalSecs = $IntervalMins * 60

#Get cpu.ready.summation stats for $Entity and save them in a variable
$CPUSumStats = Get-Stat -Stat cpu.ready.summation -Entity $Entity -Start $StartDate -Finish $EndDate -IntervalMins $IntervalMins

<#Transform each cpu.ready.summation value into CPU Ready Time percentage,
creates an object with the Entity, CPUReadyTime and TimeStamp#>
foreach ($Stat in $CPUSumStats) {
    
    #Formula
    $CPUReady = ($Stat.Value / ($IntervalSecs * 1000)) * 100
    
    #Add properties from original object to the new object
    $TimeStamp = $Stat.Timestamp
    $Entity = $Stat.Entity

    #Create object
    $Obj = [pscustomobject]@{
        'Entity' = $Entity
        'CPUReadyTime' = '{0:n2}' -f $CPUReady
        'TimeStamp' = $TimeStamp
    }

    #Export object to csv
    $Obj | Export-Csv -Path $OutputPath -Append -NoTypeInformation

}

The script above starts by setting up some user-defined variables, these variables are used to generate the output, this is in fact the only required input. The code is commented for your convenience.

Let's now compare VMware's formula with our formula.

(CPU summation value / (<chart default update interval in seconds> * 1000)) * 100

($Stat.Value / ($IntervalSecs * 1000)) * 100

$Stat.Value is the value of CPU Ready summation in milliseconds
$IntervalSecs is the amount of minutes defined in the variable $IntervalMins multiplied by 60

Notice the color coding, the two formulas match each other. We can now assume that our script will return the average CPU Ready Time for 30 minutes intervals, for the MyVM entity and for the date range 02/02 - 02/07/2021.

Intervals

Intervals are worth discussing. I used a 30-minute interval in the example, however, it is important to be aware of the other default intervals. The aforementioned VMware KBA summarizes them under Resolution.

Realtime: 20 seconds
Past Day: 5 minutes (300 seconds)
Past Week: 30 minutes (1800 seconds)
Past Month: 2 hours (7200 seconds)
Past Year: 1 day (86400 seconds)
https://kb.vmware.com/s/article/2002181

These are the default metric collection levels. Realtime is available for the last 24 hours and, past day, week, month and year are self-explanatory. That said, we can only use the Realtime parameter to get stats for the past day, we can only assign 5 to the IntervalMins parameter also for the past day, 30 minutes for the past week and so on. Here is an example.

PS C:\> Get-Stat -Entity (Get-VM -Name MyVM) -Cpu -Start 12/01/2020 -Finish 01/01/2021 -IntervalMins 5
PS C:\>

The current month is February 2021, so if we run the this command now no data will be returned because the selected time range is the month of December 2020 and the selected interval is 5 (minutes). We would have to use the interval that corresponds to past year (1 day). Let's try again.

PS C:\> Get-Stat -Entity (Get-VM -Name MyVM) -Cpu -Start 12/01/2020 -Finish 01/01/2021 -IntervalMins 1440
MetricId                Timestamp                          Value Unit     Instance
--------                ---------                          ----- ----     --------
cpu.usage.average       12/31/2020 6:00:00 PM               4.53 %                
cpu.usage.average       12/30/2020 6:00:00 PM               4.55 %                
cpu.usage.average       12/29/2020 6:00:00 PM               4.54 %                
cpu.usage.average       12/28/2020 6:00:00 PM               4.54 %                
cpu.usage.average       12/27/2020 6:00:00 PM               4.52 %                
cpu.usage.average       12/26/2020 6:00:00 PM               4.78 %                
cpu.usage.average       12/25/2020 6:00:00 PM               4.55 %

Again, I am dropping most of the output, a few lines should be enough to make my point. Now, we did get some data because, as you can see, the interval is one day (1440 minutes or 86400 seconds). That is why it is important to understand intervals and data collection levels.

Output

The output of our original script is a csv file with CPU Ready Time percentage for each CPU instance of the MyVM entity. Since this VM has two vCPUs (instances) the report will contain two objects with the same timestamp, in other words, two every 30 minutes. Once this data is exported to csv, we can use a pivot table to summarize and consolidate instance data.

Conclusion

Getting CPU Ready Time and being able to analyze its values is key for administrators, this is one of the most important performance metrics in virtualized environments. The Get-Stat PowerCLI cmdlet is a free alternative to GUI-based monitoring tools to collect CPU Ready Time data, which can then be transformed into charts and reports. The same can be done with other stat types.

Valid intervals depend on the selected date range. In addition, the data that can be retrieved depend on the vCenter server retention period. Therefore, if you want to get information from a year ago but vCenter only keeps three months of metrics, you are out of luck.

Feel free to leave comments and questions below and thanks for reading this far.

The full and commented source code for this post is available in my GitHub repository.