Incorrectly Reported Separated Network Partitions in VSAN Cluster

I’ve been playing around with VSAN, automating the build of a 3 node Management cluster using ESXi 6.0 Update 1. I came across and issue where I moved one of my hosts to another cluster and then back into the VSAN cluster, and when it came back it showed as a separate network partition, and had a separate VSAN datastore.

The VSAN Disk Management page under my cluster in the Web Client showed that the Network Partition Group was different for this host to my other two hosts, despite the network being absolutely fine.

Turned out that the host had not rejoined the VSAN cluster, but had created its own 1-node cluster. I resolved this by running the following commands:

On the partitioned host:

esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2016-09-21T10:23:35Z

   Local Node UUID: 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 3451e257-cedd-8772-4b31-0cc47ab460e8

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

   Sub-Cluster Member UUIDs: 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Sub-Cluster Membership UUID: 9c5fe257-e053-7716-ca0a-0cc47ab46218

This shows the host in a single node cluster

On a surviving host:

esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2016-09-21T11:14:55Z

   Local Node UUID: 57e006b6-71ab-c8f6-7d1d-0cc47ab460e8

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 57e006b6-71ab-c8f6-7d1d-0cc47ab460e8

   Sub-Cluster Backup UUID: 57e0f22f-3071-fe1a-fd8e-0cc47ab460ec

   Sub-Cluster UUID: 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 2

   Sub-Cluster Member UUIDs: 57e0f22f-3071-fe1a-fd8e-0cc47ab460ec, 57e006b6-71ab-c8f6-7d1d-0cc47ab460e8

   Sub-Cluster Membership UUID: 3451e257-cedd-8772-4b31-0cc47ab460e8

This showed me there were only 2 nodes in the cluster, we will use the Sub-Cluster UUID from here in a moment.

On the partitioned host:

esxcli vsan cluster leave

esxcli vsan cluster join -u 57e0040c-83a9-add9-ec1f-0cc47ab46218

esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2016-09-21T10:24:26Z

   Local Node UUID: 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Local Node Type: NORMAL

   Local Node State: AGENT

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 57e006b6-71ab-c8f6-7d1d-0cc47ab460e8

   Sub-Cluster Backup UUID: 57e0f22f-3071-fe1a-fd8e-0cc47ab460ec

   Sub-Cluster UUID: 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Sub-Cluster Membership Entry Revision: 1

   Sub-Cluster Member Count: 3

   Sub-Cluster Member UUIDs: 57e0f22f-3071-fe1a-fd8e-0cc47ab460ec, 57e006b6-71ab-c8f6-7d1d-0cc47ab460e8, 57e0040c-83a9-add9-ec1f-0cc47ab46218

   Sub-Cluster Membership UUID: 3451e257-cedd-8772-4b31-0cc47ab460e8

Now we see all three nodes back in the cluster. The data will take some time to rebuild on this node, but once done, the VSAN health check should show as Healthy, and there should be a single VSAN datastore spanning all hosts.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s