PowerCLI – getting on the road

Powershell is a tool I have been interested in for a number of years, unfortunately in my previous role I was limited to using VBscript for automation due to running on a purely Windows Server 2003 environment, and one sans Powershell v1.

When I started in my current role, I was excited to start using Powershell to replace VBscript as my primary scripting language.I have found a number of resources which are great for learning Powershell, not least of which is the Powershell In A Month Of Lunches book by Don Jones. I started reading but ended up learning through just getting my hands dirty.

So, I rolled up my sleeves, and started using Powershell to do small things. Once I felt a little more comfortable with it, I decided to download VMware PowerCLI and give that a go. Never did I realise that I needed a tool so badly until I had it, I was blown away by what PowerCLI allowed me to do.Since then I have built up a good library of scripts which allow me to report on, and automate some of the daily VMware grind I experience in my work. Examples of scripts I have created, which provide functionality not available natively in vSphere are:

  • A script to go through a vCenter instance, identifying the Path Selection Policy for all disks, flagging and offering to correct where these are not correct
  • Scripts to list all VMs with RDMs on them, this helped us to plan for patching our hosts
  • A daily check script which exports all vCenter alarms to an HTML page to allow alarms across multiple vCenters to be checked quickly

I decided to list some of the resources which can really help to get going with PowerCLI, hopefully this will assist someone looking to learn to use this toolset:

  • Powershell In A Month Of Lunches book by Don Jones – this book is great for getting the basics of Powershell down
  • Notepad++ – there are other tools available, some recommend Quest’s PowerGUI, but this is what I use for PowerCLI scripting day to day. It is only available on Windows but if you are using PowerCLI then you are using Windows so this shouldn’t be too much of a problem
  • VMware’s PowerCLI Documentation site – this site is invaluable for the explanations of specific cmdlets and the available parameters
  • VMware vSphere PowerCLI Reference by Luk Dekens and Alan Renouf – this was the first book released on PowerCLI and has some great examples of what you can do with the tool

I was watching this interesting interview of Alan Renouf by Mike Laverick on his Chinwag Reloaded podcast, Alan now works on the PowerCLI team at VMware and has been a key player in the community of PowerCLI since its inception. I thought I would include some of this information here to show the developments to PowerCLI which are around the corner.

Here they discuss a new feature relating to PowerCLI (and available here), which started as a VMware fling and has been developed by the internal team responsible for the vSphere Web Client. The primary new feature is called ‘PowerActions’ and it will allow your library of scripts to be stored within vSphere, and make them accessible to other administrators using the Web Client as you desire. This was of interest to me as I have been looking at git style repositories to allow for version control and peer review which our company can use to store, share, and manage our ever growing library of PowerCLI scripts.

Further to this, and exciting for people running an OS other than Windows, or in an environment where installation of PowerCLI is not possible, is that a shell will be available through the Web Client. This will be a great help when troubleshooting and wanting to run some of the more useful PowerCLI commands without having to fire up and connect your usual shell.

There is even more new stuff coming as well; the ability to create new menu items in the Web Client which will run your scripts and return the output in a message box, this will mean that for common tasks like reporting or bespoke automation not present in vSphere, you can present these simply to other administrators or users of vSphere and they can utilise the additional functionality these provide without having to have any knowledge of PowerCLI.

Where other vendors have provided limited API integration through their own PowerShell libraries, it is great to see VMware throwing a lot of time and money into ensuring their API delivers what customers want, and that a growing and helpful community has developed around this which allows a keen administrator to quickly learn and develop their skills with PowerCLI.

If you are a VMware administrator then PowerCLI is definitely something you should get involved with, and there has never been a better time to do this. I will look to publish links to some of my scripts at some point, as well as discuss some useful PowerCLI cmdlets.

Storage I/O Control – what to expect

Storage I/O Control, or SIOC, was introduced into vSphere back in vSphere 4.1, it provides a way for vSphere to combat what is known as the ‘noisy neighbour’ syndrome. This describes the situation where multiple VMs reside on a single datastore, and one or more of these VMs take more than their fair share of bandwidth to the datastore. This could be happening because a VM decides to misbehave, because of poor choices in VM placement, or because workloads have changed.

The reigning principle behind SIOC is one of fairness, allowing all VMs a chance to read and write without being swamped by one or more ‘greedy’ VMs. This is something which, in the past, would have been controlled by disk shares, and indeed this method can still be used to prioritise certain workloads on a datastore over others. The advantage with SIOC is that, other than the couple of configurable settings, described below, no manual tinkering is really required.

Options available for Storage I/O Control
Options available for Storage I/O Control

There are only two settings to pick for SIOC:

1) SIOC Enabled/Disabled – either turn SIOC on, or off, at the datastore level. More on considerations for this further down

2) Congestion Threshold – this is the trigger point at which SIOC will kick in and start doing its thing, throttling I/O to the datastore. This can be configured with one of two types of value:

a) Manual – this is set in milliseconds and this defaults at 30ms, but is variable depending on your storage. VMware have tables on how to calculate this in their SIOC best practice guide, but the default should be fine for most situations. If in doubt then your storage provider should be able to give guidance on the correct value to choose.

b) Percentage of peak throughput – this is only available through the vSphere Web Client, and was added in vSphere 5.1, this takes the guess work out of setting the threshold, replacing it with an automated method for vSphere to analyse the datastore I/O capabilities and use this to determine the peak throughput.

My experience of using SIOC is described in the following paragraphs, improvements were seen, and no negative performance experienced (as expected), although some unexpected results were received.

Repeated latency warnings similar to the following from multiple hosts were seen, for multiple datastores across different storage systems:

Device naa.5000c5000b36354b performance has deteriorated. I/O latency increased from average value of 1832 microseconds to 19403 microseconds

These warnings report the latency time in microseconds, so in the above example, the latency is going from 1.8ms to 19ms, still a workable latency, but the rise is flagged due to the large increase (in this case by a factor of ten). The results seen in the logs were much worse than this though, sometimes latency was rising to as much as 20 seconds, this was happening mostly in the middle of the night

After checking out the storage configuration, it was identified that Storage I/O Control was turned off across the board. This is set to disabled by default for all datastores and as such, had been left as was. Turning SIOC on seemed like a sensible way forward so the decision was taken to proceed in turning it on for some of the worst affected datastores.

After turning on SIOC on a handful of datastores, a good reduction in the number of I/O latency doublings being reported in the ESXi logs was seen. Unfortunately a new message began to flag in the host events logs:

Non-VI workload detected on the datastore

This was repeatedly seen against the LUNs for which SIOC had been enabled, VMware have a knowledge base article for this which describes the issue. In this case, the problem stemmed from the fact that the storage backend providing the LUNs had a single disk pool (or mDisk Group, as this was presented by an IBM SVC) which was shared with unmanaged RDMs, and other storage presented outside the VMware environment.

The impact of this is that, whilst VMware plays nicely, throttling I/O access when threshold congestion is reached, other workloads such as non-SIOC datastores, RDMs, or other clients of the storage group, will not be so fair in their usage of the available bandwidth. This is due to the spindles presented being shared, one solution to this would be to present dedicated disk groups to VMware workloads, ensuring that all datastore carved out of these disks have SIOC turned on.

We use EMC VNX, and IBM SVC as our storage of choice, recommendations from both these vendors is to turn SIOC on for all datastores, and to leave it on. I can only imagine that the reason this is still not a default is because it is not suitable for every storage type. As with all these things, checking storage vendor documentation is probably the best option, but SIOC should provide benefit in most use cases, although as described above, you may see some unexpected results. It is worth noting that this feature is Enterprise Plus only, so anyone running a less feature packed version of vSphere will not be able to take advantage of this feature.

vNUMA CPU Alignment – doing it right

As I’ve said before, the majority of our VMware environment is running on Cisco UCS blade servers, and the majority of these are running dual hex-core CPUs. With the broad spectrum of Operating Systems and applications running across our many hundreds of VMs, there are inevitably many, many VMs with multiple vCPUs.

This shows the CPU Ready/Usage stats before and after re-aligning vCPU configuration from single core-multiple sockets, to multiple core-single socket

NUMA, in a nutshell, utilises the host CPU’s local memory bus to allow faster memory access time for workloads on that specific CPU. This is particularly important for latency sensitive workloads. vNUMA is VMware’s implementation of utilising NUMA to reduce memory latency for VMs running on ESXi.

When looking at CPU performance issues with a guest VM, no host level contention for CPU resources existed. Through the VM performance graphs in the vSphere client, CPU Ready times could be seen to be spiking often, this was on a VM with multiple CPUs set to 2 x socket and 8 x cores (16 vCPU).

This shows the CPU Ready/Usage stats before and after re-aligning vCPU configuration from single core-multiple sockets, to multiple core-single socket; a marked difference

 

This problem is fairly well documented, and is detailed in VMware KB 1026063; when utilising the NUMA features of VMware, it is important to configure the vCPU layouts for your VMs to align with the physical characteristics of your hosts if possible; this will help to guarantee that the VM guest can be vMotioned to other hosts in the cluster, with different physical CPU configurations. A better solution, where identical CPU configurations can not be guaranteed across all your hosts, is to define all vCPUs as x sockets with a single core.

In this case our physical CPU architecture is 2 sockets x 6 cores, and the VM was configured with 1 socket x 8 cores. This prevents the hypervisor and guest OS from completely utilising either one, or both sockets in our physical host, and is therefore missing out on the speed benefits which NUMA can bring. And can be exhibited as high or consistently high ready States for this VM guest.

There is a great VMware blog article about the performance implications of vNUMA design selection here, which echoes the VMware Best Practices guide, stating that the best way to approach this is to set your vCPU configuration ‘flat and wide’. this means that if your VM requires 8 cores, then configure your vCPU with 8 sockets with 1 core each, rather than 2 quad-core or 4 dual-core sockets.

This allows the vNUMA technology to balance the load as it best sees fit and should prevent CPU Ready spike issues. There are of course edge cases, where specific software licensing may force you to use as few sockets as possible, in which case alignment to physical host CPU should always be attempted, in the case above, the VM could have been configured to 1 socket x 6 cores, or 2 sockets x 6 cores. Be aware, that stepping outside of the ‘flat and wide’ model will prevent vNUMA from doing its job, and will bow to your judgement of vCPU configuration; this means you had better have got it right!