Deploying NX-OSv 9000 on vSphere

Cisco have recently released (1st March 2017) an updated virtual version of their Nexus 9K switch, and the good news is that this is now available as an OVA for deployment onto ESXi. We used to use VIRL in a lab, which was fine until a buggy earlier version of the virtual 9K was introduced which prevented core functionality like port channels. This new release doesn’t require the complex environment that VIRL brings, and lets you deploy a quick pair of appliances in vPC to test code against.

The download is available here, and while there are some instructions available, I did not find them particularly useful in deploying the switch to my ESXi environment. As a result, I decided to write up how I did this to hopefully save people spending time smashing their face off it.

Getting the OVA

NOTE: you will need a Cisco login to download the OVA file. My login has access to a bunch of bits so not sure exactly what the requirements are around this.

There are a few versions available from the above link, including a qcow2 (KVM) image, a .vmdk file (for rolling your own VM), a VirtualBox image (for use with VirtualBox and/or Vagrant), and an OVA (for use with Fusion, Workstation, ESXi).

Once downloaded we are ready to deploy the appliance. There are a few things to bear in mind here:

  1. This can be used to pass VM traffic between virtual machines: there are 6 connected vNICs on deployment, 1 of these simulates the mgmt0 port on the 9K, and the other 5 are able to pass VM traffic.
  2. vNICs 2-6 should not be attached to the management network (best practice)
  3. We will need to initially connect over a virtual serial port through the host, this will require opening up the ESXi host firewall temporarily

Deploying the OVA

You can deploy the OVA through the vSphere Web Client, or the new vSphere HTML5 Web Client, I’ve detailed how to do this via PowerShell here, because who’s got time for clicking buttons?

1p474b


# Simulator is available at:
# https://software.cisco.com/download/release.html?mdfid=286312239&softwareid=282088129&release=7.0(3)I5(1)&relind=AVAILABLE&rellifecycle=&reltype=latest
# Filename: nxosv-final.7.0.3.I5.2.ova
# Documentation: http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/nx-osv/configuration/guide/b_NX-OSv_9000/b_NX-OSv_chapter_01.html

Function New-SerialPort {
  # stolen from http://access-console-port-virtual-machine.blogspot.co.uk/2013/07/add-serial-port-to-vm-through-gui-or.html
  Param(
     [string]$vmName,
     [string]$hostIP,
     [string]$prt
  ) #end
  $dev = New-Object VMware.Vim.VirtualDeviceConfigSpec
  $dev.operation = "add"
  $dev.device = New-Object VMware.Vim.VirtualSerialPort
  $dev.device.key = -1
  $dev.device.backing = New-Object VMware.Vim.VirtualSerialPortURIBackingInfo
  $dev.device.backing.direction = "server"
  $dev.device.backing.serviceURI = "telnet://"+$hostIP+":"+$prt
  $dev.device.connectable = New-Object VMware.Vim.VirtualDeviceConnectInfo
  $dev.device.connectable.connected = $true
  $dev.device.connectable.StartConnected = $true
  $dev.device.yieldOnPoll = $true

  $spec = New-Object VMware.Vim.VirtualMachineConfigSpec
  $spec.DeviceChange += $dev

  $vm = Get-VM -Name $vmName
  $vm.ExtensionData.ReconfigVM($spec)
}

# Variables - edit these...
$ovf_location = '.\nxosv-final.7.0.3.I5.1.ova'
$n9k_name = 'NXOSV-N9K-001'
$target_datastore = 'VBR_MGTESX01_Local_SSD_01'
$target_portgroup = 'vSS_Mgmt_Network'
$target_cluster = 'VBR_Mgmt_Cluster'

$vi_server = '192.168.1.222'
$vi_user = 'administrator@vsphere.local'
$vi_pass = 'VMware1!'

# set this to $true to remove non-management network interfaces, $false to leave them where they are
$remove_additional_interfaces = $true

# Don't edit below here
Import-Module VMware.PowerCLI

Connect-VIServer $vi_server -user $vi_user -pass $vi_pass

$vmhost = $((Get-Cluster $target_cluster | Get-VMHost)[0])

$ovfconfig = Get-OvfConfiguration $ovf_location

$ovfconfig.NetworkMapping.mgmt0.Value = $target_portgroup
$ovfconfig.NetworkMapping.Ethernet1_1.Value = $target_portgroup
$ovfconfig.NetworkMapping.Ethernet1_2.Value = $target_portgroup
$ovfconfig.NetworkMapping.Ethernet1_3.Value = $target_portgroup
$ovfconfig.NetworkMapping.Ethernet1_4.Value = $target_portgroup
$ovfconfig.NetworkMapping.Ethernet1_5.Value = $target_portgroup
$ovfconfig.DeploymentOption.Value = 'default'

Import-VApp $ovf_location -OvfConfiguration $ovfconfig -VMHost $vmhost -Datastore $target_datastore -DiskStorageFormat Thin -Name $n9k_name

if ($remove_additional_interfaces) {
  Get-VM $n9k_name | Get-NetworkAdapter | ?{$_.Name -ne 'Network adapter 1'} | Remove-NetworkAdapter -Confirm:$false
}

New-SerialPort -vmName $n9k_name -hostIP $($vmhost | Get-VMHostNetworkAdapter -Name vmk0 | Select -ExpandProperty IP) -prt 2000

$vmhost | Get-VMHostFirewallException -Name 'VM serial port connected over network' | Set-VMHostFirewallException -Enabled $true

Get-VM $n9k_name | Start-VM

This should start the VM, we will be able to telnet into the host on port 2000 to reach the VM console, but it will not be ready for us to do that until this screen is reached:

Capture

Now when we connect we should see:

Capture

At this point we can enter ‘n’ and go through the normal Nexus 9K setup wizard. Once the management IP and SSH are configured you should be able to connect via SSH, the virtual serial port can then be removed via the vSphere Client, and the ‘VM serial port connected over network’ rule should be disabled on the host firewall.

Pimping things up

Add more NICs

Obviously here we have removed the additional NICs from the VM, which makes it only talk over the single management port. We can add a bunch more NICs and the virtual switch will let us use them to talk on. This could be an interesting use case to pass actual VM traffic through the 9K.

Set up vPC

The switch is fully vPC (Virtual Port Channel) capable, so we can spin up another virtual N9K and put them in vPC mode, this is useful to experiment with that feature.

Bring the API!

The switch is NXAPI capable, which was the main reason for me wanting to deploy it, so that I could test REST calls against it. Enable NXAPI by entering the ‘feature nxapi’ commmand.

Conclusion

Hopefully this post will help people struggling to deploy this OVA, or wanting to test out NXOS in a lab environment. I found the Cisco documentation a little confusing so though I would share my experiences.

Advertisements

FlexPod and UCS – where are we now?

I have been working on FlexPod for about a year now, and on UCS for a couple of years; in my opinion it’s great tech and takes a lot of the pain away from designing new data center infrastructure solutions, taking away the guesswork, and bringing inherent simplicity, reliability and resilience to the space. Over the last year or so, there has been a shift in the Cisco Validated Designs (CVDs) coming out of Cisco, and as things have moved forward, there is a noticeable shift in the way things are going with FlexPod. I should note that some of what I discuss here is already clear, and some is mere conjecture. I think the direction things are going in with FlexPod is a symptom of changes in the industry, but it is clear that converged solutions are becoming more and more desirable as the Enterprise technology world moves forward.

So what are the key changes we are seeing?

The SDN elephant in the room 

Cisco’s ACI (Application-centric Infrastructure) has taken a while to get moving; the replacement of existing 3-tier network architecture with a leaf-spine network is something which is not going to happen overnight. The switched fabric is arguably a great solution for modern data centers, where east-west traffic is the bulk of network activity, and Spanning Tree Protocol continues to be the bane of network admins, but its implementation often requires either green-field deployments, or forklift replacement of large portions of the existing core networking of a data center.

That’s not to say ACI is not doing OK in terms of sales; Cisco’s figures, and case studies, seem to show that there is uptake, and some large customers taking this on. So how does this fit in with FlexPod? Well, the Cisco Validated Designs (CVDs) released over the last 12 months have all included the new Nexus 9000 series switches, rather than the previous stalwart Nexus 5000 series. These are ACI based switches, which are also able to operate in the ‘legacy’ NX-OS mode. Capability wise, in the context of FlexPod, there is not a lot of difference, they now have FCoE support, can do vPC, QoS, Layer 3, and all the good stuff we come to expect from Nexus switches.

So from what I can gather, the inclusion of 9K switches in the FlexPod line (outside of the FlexPod with ACI designs), is there to enable FlexPod customers to more easily move into the leaf/spine ACI network architecture at a later date, should they wish to do this. This makes sense, and the pricing on the 9Ks being used looks favourable over the 5Ks, so this is a win-win for customers, even if they don’t eventually decide to go with ACI.

40GbE standard 

Recent announcements around the Gen 3 UCS Fabric Interconnects have revealed that 40GbE is now going to be the standard for UCS connectivity solutions, and the new chassis designs show 4 x 40GbE QSFP connections, totaling 320Gbps total bandwidth per chassis, this is an incredible throughput, and although I can’t see 99% of customers going anywhere near these levels, it does help to strengthen the UCS platform’s use cases for even the most high performance environments, and reduces the requirement for Infiniband type solutions for high throughput environments.

Another interesting point, and following on from the ACI ramblings above, is that the new 6300 series Fabric Interconnects are now based on the Nexus 9300 switching line, rather than the Nexus 5K based 6200 series. This positions them perfectly to act as a leaf in an ACI fabric one day, should this become the eventual outcome of Cisco’s strategy.

HTML5 FTW! 

With the announcements about the new UCS generation, came the news that from UCS firmware version 3.1, the software for UCS would now be unified for UCS classic, UCS Mini, and the newish M-series systems, this simplifies things for people looking at version numbers and how they relate to the relevance of the release, and means that there should now be relative feature parity across all footprints of UCS systems.

The most exciting, if you have experienced long running pain with Java, is that the new release incorporates the HTML5 interface, which has been seen on UCS Mini since its release. I’m sure this will bring new challenges with it, but for now at least, something fresh to look forward to for those running UCS classic.

FlexPod Mini – now not so mini 

 

FlexPod Mini is based on the UCS Mini release, which came out around 18 months ago, this is based on the I/O Modules (or FEXs) in the UCS 5108 chassis, being replaced with UCS 6324 pocket sized Fabric Interconnects, ultimately cramming a single chassis of UCS, and the attached network equipment into just 6 rack units. This could be expanded with C-series servers, but the scale for UCS blades, was strictly limited to the 8 blade limit of a standard chassis. With the announcement of the new Fabric Interconnect models came the news of a new ‘QSFP Scalability Port License’, which allows the 40GbE port on the UCS 6324 FI to be utilized with a 1 x QSFP to 4 x SFP+ cable, to add another chassis to the UCS Mini.

Personally I haven’t installed a UCS Mini, but the form factor is a great fit for certain use cases, and the more flexible this is, the more desire there will be to use this solution. For FlexPod, this ultimately means more suitable use cases, particularly in the ROBO (remote office/branch office) type scenario.

What’s Next? 

So with the FlexPod now having new switches, and new UCS hardware, it seems something new from NetApp is next on the list for FlexPod. The FAS8000 series is a couple of years old now, so we will likely see a refresh on this at some point, probably with 40GbE on board, more flash options, and faster CPUs. The recent purchase of SolidFire by NetApp will also quite probably see some kind of SolidFire based FlexPod CVD coming out of the Cisco/NetApp partnership in the near future.

We are also seeing the release of some exciting (or at least as exciting as these things can be!) new software releases this year: Windows Server 2016, vSphere 6.5 (assuming this will be revealed at VMworld), and OpenStack Mitaka, all of which will likely bring new CVDs.

In the zone…basic zoning on Cisco Nexus switches

In this post I look to go over some basic FCoE zoning concepts for Cisco Nexus switches, although FCoE has not really captured the imagination of the industry, it is used in a large number of Cisco infrastructure deployments, particularly around Nexus and UCS technologies. My experience is mostly based on FlexPods where we have this kind of design (this shows FCoE connectivity only, to keep things simple):

Screen Shot 2015-11-08 at 18.38.23

Zoning in a FlexPod is simple enough, we may have a largish number of hosts, but we are only zoning on two switches, only have 4 or 8 FCoE targets, depending on our configuration. In fact the zoning configuration can be fairly simply automated using PowerShell by tapping into the NetApp, Cisco UCS, and NX-OS APIs. The purpose of what we are doing here though is to describe the configuration steps required to complete the zoning.

The purpose of zoning is to restrict access to a LUN (Logical Unit Number), or essentially a Fibre Channel block device, on our storage, to one or more access devices, or hosts. This is useful in the case of boot disks, where we only ever want a single host accessing that device, and in the case of shared data devices, like cluster shared disks in Microsoft Clustering, or VMFS datastore in the VMware world, where we only want a subset of hosts to be able to access the device.

I found configuring zoning on a Cisco switch took a bit of getting my head around, so hopefully the explanation below will help to make this simpler for someone else.

From the Cisco UCS (or any server infrastructure you are running), you will need to gather a list of the WWPNs for the initiators wanting to connect to the storage. These will be in the format of 50:02:77:a4:10:0c:4e:21, this being a 16 byte hexadecimal number. Likewise, you will need to gather the WWPNs from your storage (in the case of FlexPod, your NetApp storage system).

Once we have these, we are ready to do our zoning on the switch. When doing the zoning configuration there are three main elements of configuration we need to understand:

  1. Aliases – these match the WWN to a friendly name, and sit in the device alias database on the switch. You can get away without using this, and just use the native WWNs later, but this will make things far more difficult should something go wrong. So basically these just match WWNs to devices.
  2. Zones – these logically group initiators and targets together, meaning that only the device aliases listed in the zone are able to talk to one another. This provides security and ease of management, a device can exist in more than one zone.
  3. Zonesets – this groups together zones, allowing the administrator to bring all the zones online or offline together. Only one zoneset can be active at a time.

On top of this, there is one more thing to understand when creating zoning on our Nexus switch, and that is the concept of a VSAN. A VSAN, or Virtual Storage Area Network, is the Fibre Channel equivalent of a VLAN. It is a logical collection of ports which together form a single discrete fabric.

So let’s create a fictional scenario, and create the zoning configuration for this. We have a FlexPod with two Nexus 5k switches, with separate fabrics, as shown in the diagram above, meaning that our 0a ports on the servers only go to Nexus on fabric A, and 0b ports only go to Nexus on fabric B. Both of our e0c ports on our NetApp storage go to NexusA, and both our e0d ports go to NexusB:

AFAS01 – e0c, e0d

NAFAS02 – e0c, e0d

And 3 Cisco UCS service profiles, each with two vHBAs, wanting to access storage on these targets, these are created as follows:

UCSServ01 – 0a, 0b

UCSServ02 – 0a, 0b

UCSServ03 – 0a, 0b

So on NexusA, we need the following aliases in our database:

Device Port WWPN Alias Name
NAFAS01 e0c 35:20:01:0c:11:22:33:44 NAFAS01_e0c
NAFAS02 e0c 35:20:02:0c:11:22:33:44 NAFAS01_e0c
UCSServ01 0a 50:02:77:a4:10:0c:0a:01 UCSServ01_0a
UCSServ02 0a 50:02:77:a4:10:0c:0a:02 UCSServ02_0a
UCSServ03 0a 50:02:77:a4:10:0c:0a:03 UCSServ03_0a

And on NexusB, we need the following:

Device Port WWPN Alias Name
NAFAS01 e0d 35:20:01:0d:11:22:33:44 NAFAS01_e0d
NAFAS02 e0d 35:20:02:0d:11:22:33:44 NAFAS02_e0d
UCSServ01 0b 50:02:77:a4:10:0c:0b:01 UCSServ01_0b
UCSServ02 0b 50:02:77:a4:10:0c:0b:02 UCSServ02_0b
UCSServ03 0b 50:02:77:a4:10:0c:0b:03 UCSServ03_0b

And the zones we need on each switch are, firstly for NexusA:

Zone Name Members
UCSServ01_a NAFAS01_e0c

NAFAS01_e0c

UCSServ01_0a

UCSServ02_a NAFAS01_e0c

NAFAS01_e0c

UCSServ02_0a

UCSServ03_a NAFAS01_e0c

NAFAS01_e0c

UCSServ03_0a

And for Nexus B:

Zone Name Members
UCSServ01_b NAFAS01_e0d

NAFAS01_e0d

UCSServ01_0b

UCSServ02_b NAFAS01_e0d

NAFAS01_e0d

UCSServ02_0b

UCSServ03_b NAFAS01_e0d

NAFAS01_e0d

UCSServ03_0b

This gives us a zone for each server to boot from, allowing that vHBA on the server to boot from either of the NetApp interfaces that it will be able to see on its fabric. The boot order itself will be controlled from within UCS, by creating zoning for the server to boot on either fabric we create resilience. All of this is just to demonstrate how we construct the zoning configuration so things will no doubt be different in a different environment.

So now we know what we should have in our populated alias database, and our zone configuration, we just need to create our zoneset. Well, we will have one zoneset per fabric, so one for NexusA:

Zoneset Name Members
UCSZonesetA UCSServ01_a

UCSServ02_a

UCSServ03_a

And the zoneset for NexusB:

Zoneset Name Members
UCSZonesetB UCSServ01_b

UCSServ02_b

UCSServ03_b

Now we are ready to put this into some NXOS CLI, and enter this on our switches. The general commands for creating new aliases are:
device-alias database
device-alias name <alias_name> pwwn <device_wwpn>
exit
device-alias commit

So for our NexusA, we do the following:
device-alias database
device-alias name NAFAS01_e0c pwwn 35:20:01:0c:11:22:33:44
device-alias name NAFAS02_e0c pwwn 35:20:02:0c:11:22:33:44
device-alias name UCSServ01_0a pwwn 50:02:77:a4:10:0c:0a:01
device-alias name UCSServ02_0a pwwn 50:02:77:a4:10:0c:0a:02
device-alias name UCSServ03_0a pwwn 50:02:77:a4:10:0c:0a:03
exit
device-alias commit

And for Nexus B, we do:
device-alias database
device-alias name NAFAS01_e0d pwwn 35:20:01:0d:11:22:33:44
device-alias name NAFAS02_e0d pwwn 35:20:02:0d:11:22:33:44
device-alias name UCSServ01_0b pwwn 50:02:77:a4:10:0c:0b:01
device-alias name UCSServ02_0b pwwn 50:02:77:a4:10:0c:0b:02
device-alias name UCSServ03_0b pwwn 50:02:77:a4:10:0c:0b:03
exit
device-alias commit

So that’s our alias database taken care of, now we can create our zones. The command set for creating a zone is:
zone name <zone_name> vsan <vsan_id>
member device-alias <device_1_alias>
member device-alias <device_2_alias>
member device-alias <device_3_alias>
exit

I will use VSAN IDs 101 for fabric A, and 102 for fabric B. So here we will create our zones for NexusA:
zone name UCSServ01_a vsan 101
member device-alias NAFAS01_e0c
member device-alias NAFAS02_e0c
member device-alias UCSServ01_0a
exit
zone name UCSServ02_a vsan 101
member device-alias NAFAS01_e0c
member device-alias NAFAS02_e0c
member device-alias UCSServ02_0a
exit
zone name UCSServ03_a vsan 101
member device-alias NAFAS01_e0c
member device-alias NAFAS02_e0c
member device-alias UCSServ03_0a
exit

And for NexusB:
zone name UCSServ01_b vsan 102
member device-alias NAFAS01_e0d
member device-alias NAFAS02_e0d
member device-alias UCSServ01_0b
exit
zone name UCSServ02_b vsan 102
member device-alias NAFAS01_e0d
member device-alias NAFAS02_e0d
member device-alias UCSServ02_0b
exit
zone name UCSServ03_b vsan 102
member device-alias NAFAS01_e0d
member device-alias NAFAS02_e0d
member device-alias UCSServ03_0b
exit

So this is all of our zones created, now we just need to create and activate our zoneset and we have our completed zoning configuration. The commands to create and activate a zoneset are:
zoneset name <zoneset_name> vsan <vsan_id>
member <zone_1_name>
member <zone_2_name>
exit
zoneset activate name <zoneset_name> vsan <vsan_id>
exit

So now we have our NexusA configuration:
zoneset name UCSZonesetA vsan 101
member UCSServ01_a
member UCSServ02_a
member UCSServ03_a
exit
zoneset activate name UCSZonesetA vsan 101
exit

And our NexusB configuration:
zoneset name UCSZonesetB vsan 102
member UCSServ01_b
member UCSServ02_b
member UCSServ03_b
exit
zoneset activate name UCSZonesetB vsan 102
exit

So that’s how we compose our zoning configuration, and apply it to our Nexus switch. Hopefully this will be a useful reference on how to do this.