FlexPod 101 – What is a FlexPod?

I haven’t posted for a while, I started a new job, getting out of IT support, and into the area I want to be, designing and implementing infrastructure solutions as a FlexPod consultant. I have not worked with FlexPod as a concept before, but I have worked with the integral technology stack which comprises it. So far so good, it seems like a robust solution which provides a balance between scalability, performance and cost. I have decided to do a set of blog posts going through the concept and technology behind FlexPod, hopefully highlighting what sets this apart from the competition.

FlexPod: what the hell is that?

Over the last few years, the IT industry has moved from disparate silos for storage, compute and network, towards the converged (and later hyper converged) dream. One such player in this market is the FlexPod.

A collaboration between NetApp and Cisco, at a basic level this comprises the following enterprise class hardware:

  • Cisco Unified Computing System (UCS)
  • Cisco Nexus switching/routing
  • NetApp FAS Storage Arrays

This forms the hardware basis, and as is the industry’s want, there are a swathe of virtualisation solutions and business critical applications which can be used on top of the hardware:

  • VMware vSphere
  • Microsoft Hyper-V
  • Openstack
  • Citrix XenDesktop
  • SAP
  • Oracle
  • VMware Horizon View

I have been a fan of NetApp storage, and Cisco UCS compute for a while. They both offer simplicity and power in equal measure. My preference for hypervisor is ESXi but Hyper-V is becoming a more compelling solution with every release.

You throw an automation product like vRealize Automation or UCS Director on top of the stack and you have a powerful and modern private cloud solution which takes you beyond what a standard virtualised solution will deliver.

Why not just by <enter converged/hyper-converged vendor here>?

But you can run this on any hardware, right? So what sets FlexPod apart from VCE’s Vblock, hyper-converged systems like Nutanix, SimpliVity, or just rolling your own infrastructure?

The answer is the Cisco Validated Design (CVD). This is, as the name suggests, a validated and documented build blueprint, which details proven hardware and software configurations for a growing number of solutions. This gives a confidence when implementing the solution, that this will work, and goes towards putting the ‘Flex’ in FlexPod.

The other advantage of FlexPod over other converged/hyper-converged solutions is that you can tweak the scales on the hardware components (conpute/storage/network), to make the solution larger in the areas you need capacity boost, but keep it the same in the areas you don’t. You need 100TB of storage, just buy more shelves. You need 100 hosts, just buy more UCS chassis and blades. This non-linear scalability, and flexibility, separates FlexPod from rival solutions.

As far as the software, and general protocol usage goes, FlexPod is fairly agnostic. You can use FCoE, NFS, FC or iSCSI as your storage protocols, you can use whatever hypervisor you want as long as there is a CVD for it, and chances are you can find one to suit your use case.

Where can I find more information on FlexPod?

The NetApp and Cisco sites have information about what a FlexPod consists of:

http://www.netapp.com/ca/solutions/cloud/flexpod/

http://www.cisco.com/c/en/us/solutions/data-center-virtualization/flexpod/index.html

The Cisco site also has links to the CVDs, these give a good overview of what the FlexPod is about.

What’s next?

Part 2 of my FlexPod 101 series will go over the physical components of a FlexPod.

Storage I/O Control – what to expect

Storage I/O Control, or SIOC, was introduced into vSphere back in vSphere 4.1, it provides a way for vSphere to combat what is known as the ‘noisy neighbour’ syndrome. This describes the situation where multiple VMs reside on a single datastore, and one or more of these VMs take more than their fair share of bandwidth to the datastore. This could be happening because a VM decides to misbehave, because of poor choices in VM placement, or because workloads have changed.

The reigning principle behind SIOC is one of fairness, allowing all VMs a chance to read and write without being swamped by one or more ‘greedy’ VMs. This is something which, in the past, would have been controlled by disk shares, and indeed this method can still be used to prioritise certain workloads on a datastore over others. The advantage with SIOC is that, other than the couple of configurable settings, described below, no manual tinkering is really required.

Options available for Storage I/O Control
Options available for Storage I/O Control

There are only two settings to pick for SIOC:

1) SIOC Enabled/Disabled – either turn SIOC on, or off, at the datastore level. More on considerations for this further down

2) Congestion Threshold – this is the trigger point at which SIOC will kick in and start doing its thing, throttling I/O to the datastore. This can be configured with one of two types of value:

a) Manual – this is set in milliseconds and this defaults at 30ms, but is variable depending on your storage. VMware have tables on how to calculate this in their SIOC best practice guide, but the default should be fine for most situations. If in doubt then your storage provider should be able to give guidance on the correct value to choose.

b) Percentage of peak throughput – this is only available through the vSphere Web Client, and was added in vSphere 5.1, this takes the guess work out of setting the threshold, replacing it with an automated method for vSphere to analyse the datastore I/O capabilities and use this to determine the peak throughput.

My experience of using SIOC is described in the following paragraphs, improvements were seen, and no negative performance experienced (as expected), although some unexpected results were received.

Repeated latency warnings similar to the following from multiple hosts were seen, for multiple datastores across different storage systems:

Device naa.5000c5000b36354b performance has deteriorated. I/O latency increased from average value of 1832 microseconds to 19403 microseconds

These warnings report the latency time in microseconds, so in the above example, the latency is going from 1.8ms to 19ms, still a workable latency, but the rise is flagged due to the large increase (in this case by a factor of ten). The results seen in the logs were much worse than this though, sometimes latency was rising to as much as 20 seconds, this was happening mostly in the middle of the night

After checking out the storage configuration, it was identified that Storage I/O Control was turned off across the board. This is set to disabled by default for all datastores and as such, had been left as was. Turning SIOC on seemed like a sensible way forward so the decision was taken to proceed in turning it on for some of the worst affected datastores.

After turning on SIOC on a handful of datastores, a good reduction in the number of I/O latency doublings being reported in the ESXi logs was seen. Unfortunately a new message began to flag in the host events logs:

Non-VI workload detected on the datastore

This was repeatedly seen against the LUNs for which SIOC had been enabled, VMware have a knowledge base article for this which describes the issue. In this case, the problem stemmed from the fact that the storage backend providing the LUNs had a single disk pool (or mDisk Group, as this was presented by an IBM SVC) which was shared with unmanaged RDMs, and other storage presented outside the VMware environment.

The impact of this is that, whilst VMware plays nicely, throttling I/O access when threshold congestion is reached, other workloads such as non-SIOC datastores, RDMs, or other clients of the storage group, will not be so fair in their usage of the available bandwidth. This is due to the spindles presented being shared, one solution to this would be to present dedicated disk groups to VMware workloads, ensuring that all datastore carved out of these disks have SIOC turned on.

We use EMC VNX, and IBM SVC as our storage of choice, recommendations from both these vendors is to turn SIOC on for all datastores, and to leave it on. I can only imagine that the reason this is still not a default is because it is not suitable for every storage type. As with all these things, checking storage vendor documentation is probably the best option, but SIOC should provide benefit in most use cases, although as described above, you may see some unexpected results. It is worth noting that this feature is Enterprise Plus only, so anyone running a less feature packed version of vSphere will not be able to take advantage of this feature.

vNUMA CPU Alignment – doing it right

As I’ve said before, the majority of our VMware environment is running on Cisco UCS blade servers, and the majority of these are running dual hex-core CPUs. With the broad spectrum of Operating Systems and applications running across our many hundreds of VMs, there are inevitably many, many VMs with multiple vCPUs.

This shows the CPU Ready/Usage stats before and after re-aligning vCPU configuration from single core-multiple sockets, to multiple core-single socket

NUMA, in a nutshell, utilises the host CPU’s local memory bus to allow faster memory access time for workloads on that specific CPU. This is particularly important for latency sensitive workloads. vNUMA is VMware’s implementation of utilising NUMA to reduce memory latency for VMs running on ESXi.

When looking at CPU performance issues with a guest VM, no host level contention for CPU resources existed. Through the VM performance graphs in the vSphere client, CPU Ready times could be seen to be spiking often, this was on a VM with multiple CPUs set to 2 x socket and 8 x cores (16 vCPU).

This shows the CPU Ready/Usage stats before and after re-aligning vCPU configuration from single core-multiple sockets, to multiple core-single socket; a marked difference

 

This problem is fairly well documented, and is detailed in VMware KB 1026063; when utilising the NUMA features of VMware, it is important to configure the vCPU layouts for your VMs to align with the physical characteristics of your hosts if possible; this will help to guarantee that the VM guest can be vMotioned to other hosts in the cluster, with different physical CPU configurations. A better solution, where identical CPU configurations can not be guaranteed across all your hosts, is to define all vCPUs as x sockets with a single core.

In this case our physical CPU architecture is 2 sockets x 6 cores, and the VM was configured with 1 socket x 8 cores. This prevents the hypervisor and guest OS from completely utilising either one, or both sockets in our physical host, and is therefore missing out on the speed benefits which NUMA can bring. And can be exhibited as high or consistently high ready States for this VM guest.

There is a great VMware blog article about the performance implications of vNUMA design selection here, which echoes the VMware Best Practices guide, stating that the best way to approach this is to set your vCPU configuration ‘flat and wide’. this means that if your VM requires 8 cores, then configure your vCPU with 8 sockets with 1 core each, rather than 2 quad-core or 4 dual-core sockets.

This allows the vNUMA technology to balance the load as it best sees fit and should prevent CPU Ready spike issues. There are of course edge cases, where specific software licensing may force you to use as few sockets as possible, in which case alignment to physical host CPU should always be attempted, in the case above, the VM could have been configured to 1 socket x 6 cores, or 2 sockets x 6 cores. Be aware, that stepping outside of the ‘flat and wide’ model will prevent vNUMA from doing its job, and will bow to your judgement of vCPU configuration; this means you had better have got it right!