I am currently automating the build and configuration of a VMware vCenter environment, and the internet has been a great source of material in helping me with this, particularly William Lam and Brian Graf‘s websites. It seems VMware have done a great job with the changes in vSphere 6.0 of enabling automated deployments, this follows the general trends of the industry in driving automation and orchestration with everything we do in the Systems Administration world.
One thing I needed to do which I could not find any information on, was to join my standalone Platform Services Controller (PSC) to an AD domain, this is easy enough in the GUI, and is documented here. It was important for me to automate this however, so I trawled through the CLI on my PSC to figure out how to do this.
I stumbled across the following command which joins you to the AD domain of your choosing.
Once this is completed the PSC will need restarting, to enable the change. This will add the PSC to Active Directory. The next challenge was finding a scripted method to add the identity source. Once the identity source is added, permissions can be set up as normal in vCenter using this identity source.
Again, I had to trawl through the PSC OS to find this, the script is as follows:
Both of these can be carried out through an SSH session to your PSC (or embedded PSC/VCSA server). Assuming you have BASH enabled on your PSC, you can invoke this remotely using the PowerCLI ‘Invoke-VMScript’ cmdlet. This should help in the process of fully automating the process of deploying a vCenter environment.
As an aside, one issue I did have, which is discussed in the VMware forums is that I was getting the error ‘Error while extracting local SSO users’ when enumerating users/groups from my AD in the VMware GUI, this was fixed by creating a PTR record in DNS in my domain for the Domain Controller, it seems this is needed by the new VMware SSO at some point.
I hope this is useful to people, and hopefully VMware will document this sort of automation in the future, in the meantime, as I said above, William Lam and Brian Graf’s sites are a good source of information.
When a VM is powered on, a .vswp (Virtual Machine swap) file is created (note there is also a vmx swap file which gets created in the same location as the VM, this is seperate from this discussion, but is described here), its size is the memory allocation for the VM less any reservation configured. If there is not sufficient space in the configured swap file location to create this file then the VM will not power on. The use of this file for memory pages is a last resort, and will be considerably slower than using normal memory, even if this is compressed or shared. It will always be possible that you get in a situation where memory contention is occurring, and that the use of the swap file begins, to prepare for this a system design should consider the location of swap files for VMs. Below I discuss some of the considerations which should be made when placing a VM swap file:
Default location for swap file is to store it in the same datastore as the VM, this presents the following problems:
Performance – it is unlikely that the datastore the VM sits in is on top tier storage, or limited to a single VM. This means that the difference in speed between memory IO and the swapping IO once contention occurs will be great, and that the additional IO this swapping produces could well impact other workloads on the datastore. If there are multiple VMs sharing this datastore, and all running on the host with memory contention issues, then this will be further compounded and could see the datastore performance plummet
Capacity – inevitably, administrators will keep chucking workloads into a datastore, and unless Storage DRS with datastore clusters is being used, or the administrators are pro-active in balancing storage workloads, there will come a time when a VM will not power up due to insufficient space to create the .vswp file. This is particularly likely after a change to the VM configuration such as adding more disk or memory
VM swap file location can be changed either at the cluster, or host level. When choosing to move this from the default, the following should be considered:
Place swap files on the fastest storage possible – if you can place this on flash storage then fantastic, this will not be as quick as paging to/from memory, but it will be many magnitudes better than placing it on spinning disk
Place swap files as close to the host as possible – the latency incurred by traversing your SAN/IP network to get to shared storage will all impair guest performance when swapping occurs. Be aware that although the default location can be changed to host local storage (which will probably give the best performance of the host has internal flash storage), this will impair vMotion performance massively, as the entire .vswp file would need to be copied from the source host to the destination host’s disk during the vMotion activity
Do not place the .vswp on replicated storage – as with location selection for guest OS swap files, there is no point on placing the file on replicated storage; if the VM is reset or powered off then this file is deleted. If your VMs are on storage which is replicated as part of its standard capability then the .vswp files should definitely be located elsewhere
In terms of configuring the location, as stated above, this is set at either a VM, host or cluster level, if this is inconsistent across hosts in a cluster then again this may impact vMotion times as the VM migrates from a host with one configured location to another with a different location. As with most settings which can be made at the cluster level, consistency should be maintained across the cluster unless this is not possible. Bear in mind though, that having vswp consistent across the cluster, and defined to be a single datastore, could lead to high IOPS on this datastore should cluster wide memory contention occur., especially with large clusters.
As stated at the beginning of this article, swap files are sized based on VM memory allocation less reservations size. By right sizing VMs, and utilising reservations, swap file sizes, and usage, can be kept to a minimum, and these planning considerations should take precedence over all others. Hopefully memory contention will never be so bad that swap will be required, but when the day does come it is good to be prepared, by making informed, and reasoned decisions early on.
Extents allow disk presented to a vSphere system to be added to VMFS datastore to extend the file system, this aggregates multiple disks together and can be useful in a number of scenarios. Recently I saw problems where extents were being used spanning two storage systems; one of the storage systems had a controller failure which caused SCSI reservation issues on one of the LUNs making up the extent and this caused the entire datastore to go offline.
In this article I want to discuss some of the benefits and potential pitfalls in using VMFS extents in vSphere environments. Ultimately this is an available, supported, and sometimes useful feature of vSphere but there are some limitations or weaknesses that using this can bring.
Using extents allows you to create datastores up to the maximum supported by vSphere for pre-VMFS-5 datastores. It can be useful to create large datastores for the following reasons:
There may be a requirement to natively present a VMDK which is larger than the maximum LUN size available on your storage system. For example, if 2TB is the largest LUN you can present, but you need a 4TB disk for the application your VM is hosting, the aggregation of disks will allow the creation of a VMFS datastore large enough to deliver this without the need to span volumes in the guest OS, or the need to fallback to using something like RDMs which may impinge on other vSphere functionality
Datastore management will be simplified with fewer VMFS datastores required. The fewer datastores available, the less an administrator has to keep their eyes on. In addition to this, decisions made in placing VMs is made considerably simpler if there are fewer choices
Adding space to a datastore with capacity issues; in a previous role we were constrained by storage space more than any other resource, this meant that on both the storage system (NetApp FAS2050 with a single shelf of storage), and at the VMFS level, the design left little to no room to extend a VMDK should it be required. If we did need to add space to VMDKs, we had to extend the volume and LUN by the required amount on the filer, and add a small extent to the datastore in vSphere
Introduces a single point of failure; whether you are aggregating disks from one or multiple storage systems, by adding extents to a volume the head extent in the aggregated datastore (the first LUN added to the datastore) becomes a single point of failure, if any of the LUNs should become unavailable then VMs which have any blocks whatsoever on the lost LUN will no longer be available
Management from the storage side can become more difficult, given that there may be multiple LUNs, from multiple storage systems now aggregated to form a single datastore, from a storage side it is harder to identify which LUNs relate to which datastores in vSphere, to combat this it is important to document the addition of extents well, and label LUNs accordingly on the storage system
If extents are combined which span different storage devices then there may well be a loss in performance
The above is all just based on my experiences, but it seems there are legitimate use cases for choosing to use, or not use extents. My personal preference would be to present a new larger LUN where possible, formatting this in VMFS, and using Storage vMotion to migrate VMs to the new datastore. Given that since VMFS-5 introduced GPT as the partitioning method for LUNs, we can now create single extent datastores up to 64TB in size, the requirement for using extents should be diminished. There are often legitimate reasons, especially in older environments, why this is not practical or possible however, and in these cases using extents is perfectly valid.
Storage I/O Control, or SIOC, was introduced into vSphere back in vSphere 4.1, it provides a way for vSphere to combat what is known as the ‘noisy neighbour’ syndrome. This describes the situation where multiple VMs reside on a single datastore, and one or more of these VMs take more than their fair share of bandwidth to the datastore. This could be happening because a VM decides to misbehave, because of poor choices in VM placement, or because workloads have changed.
The reigning principle behind SIOC is one of fairness, allowing all VMs a chance to read and write without being swamped by one or more ‘greedy’ VMs. This is something which, in the past, would have been controlled by disk shares, and indeed this method can still be used to prioritise certain workloads on a datastore over others. The advantage with SIOC is that, other than the couple of configurable settings, described below, no manual tinkering is really required.
There are only two settings to pick for SIOC:
1) SIOC Enabled/Disabled – either turn SIOC on, or off, at the datastore level. More on considerations for this further down
2) Congestion Threshold – this is the trigger point at which SIOC will kick in and start doing its thing, throttling I/O to the datastore. This can be configured with one of two types of value:
a) Manual – this is set in milliseconds and this defaults at 30ms, but is variable depending on your storage. VMware have tables on how to calculate this in their SIOC best practice guide, but the default should be fine for most situations. If in doubt then your storage provider should be able to give guidance on the correct value to choose.
b) Percentage of peak throughput – this is only available through the vSphere Web Client, and was added in vSphere 5.1, this takes the guess work out of setting the threshold, replacing it with an automated method for vSphere to analyse the datastore I/O capabilities and use this to determine the peak throughput.
My experience of using SIOC is described in the following paragraphs, improvements were seen, and no negative performance experienced (as expected), although some unexpected results were received.
Repeated latency warnings similar to the following from multiple hosts were seen, for multiple datastores across different storage systems:
Device naa.5000c5000b36354b performance has deteriorated. I/O latency increased from average value of 1832 microseconds to 19403 microseconds
These warnings report the latency time in microseconds, so in the above example, the latency is going from 1.8ms to 19ms, still a workable latency, but the rise is flagged due to the large increase (in this case by a factor of ten). The results seen in the logs were much worse than this though, sometimes latency was rising to as much as 20 seconds, this was happening mostly in the middle of the night
After checking out the storage configuration, it was identified that Storage I/O Control was turned off across the board. This is set to disabled by default for all datastores and as such, had been left as was. Turning SIOC on seemed like a sensible way forward so the decision was taken to proceed in turning it on for some of the worst affected datastores.
After turning on SIOC on a handful of datastores, a good reduction in the number of I/O latency doublings being reported in the ESXi logs was seen. Unfortunately a new message began to flag in the host events logs:
Non-VI workload detected on the datastore
This was repeatedly seen against the LUNs for which SIOC had been enabled, VMware have a knowledge base article for this which describes the issue. In this case, the problem stemmed from the fact that the storage backend providing the LUNs had a single disk pool (or mDisk Group, as this was presented by an IBM SVC) which was shared with unmanaged RDMs, and other storage presented outside the VMware environment.
The impact of this is that, whilst VMware plays nicely, throttling I/O access when threshold congestion is reached, other workloads such as non-SIOC datastores, RDMs, or other clients of the storage group, will not be so fair in their usage of the available bandwidth. This is due to the spindles presented being shared, one solution to this would be to present dedicated disk groups to VMware workloads, ensuring that all datastore carved out of these disks have SIOC turned on.
We use EMC VNX, and IBM SVC as our storage of choice, recommendations from both these vendors is to turn SIOC on for all datastores, and to leave it on. I can only imagine that the reason this is still not a default is because it is not suitable for every storage type. As with all these things, checking storage vendor documentation is probably the best option, but SIOC should provide benefit in most use cases, although as described above, you may see some unexpected results. It is worth noting that this feature is Enterprise Plus only, so anyone running a less feature packed version of vSphere will not be able to take advantage of this feature.
As I’ve said before, the majority of our VMware environment is running on Cisco UCS blade servers, and the majority of these are running dual hex-core CPUs. With the broad spectrum of Operating Systems and applications running across our many hundreds of VMs, there are inevitably many, many VMs with multiple vCPUs.
This shows the CPU Ready/Usage stats before and after re-aligning vCPU configuration from single core-multiple sockets, to multiple core-single socket
NUMA, in a nutshell, utilises the host CPU’s local memory bus to allow faster memory access time for workloads on that specific CPU. This is particularly important for latency sensitive workloads. vNUMA is VMware’s implementation of utilising NUMA to reduce memory latency for VMs running on ESXi.
When looking at CPU performance issues with a guest VM, no host level contention for CPU resources existed. Through the VM performance graphs in the vSphere client, CPU Ready times could be seen to be spiking often, this was on a VM with multiple CPUs set to 2 x socket and 8 x cores (16 vCPU).
This problem is fairly well documented, and is detailed in VMware KB 1026063; when utilising the NUMA features of VMware, it is important to configure the vCPU layouts for your VMs to align with the physical characteristics of your hosts if possible; this will help to guarantee that the VM guest can be vMotioned to other hosts in the cluster, with different physical CPU configurations. A better solution, where identical CPU configurations can not be guaranteed across all your hosts, is to define all vCPUs as x sockets with a single core.
In this case our physical CPU architecture is 2 sockets x 6 cores, and the VM was configured with 1 socket x 8 cores. This prevents the hypervisor and guest OS from completely utilising either one, or both sockets in our physical host, and is therefore missing out on the speed benefits which NUMA can bring. And can be exhibited as high or consistently high ready States for this VM guest.
There is a great VMware blog article about the performance implications of vNUMA design selection here, which echoes the VMware Best Practices guide, stating that the best way to approach this is to set your vCPU configuration ‘flat and wide’. this means that if your VM requires 8 cores, then configure your vCPU with 8 sockets with 1 core each, rather than 2 quad-core or 4 dual-core sockets.
This allows the vNUMA technology to balance the load as it best sees fit and should prevent CPU Ready spike issues. There are of course edge cases, where specific software licensing may force you to use as few sockets as possible, in which case alignment to physical host CPU should always be attempted, in the case above, the VM could have been configured to 1 socket x 6 cores, or 2 sockets x 6 cores. Be aware, that stepping outside of the ‘flat and wide’ model will prevent vNUMA from doing its job, and will bow to your judgement of vCPU configuration; this means you had better have got it right!