Tuesday 16 February 2016

Cluster 101: Some Hyper-V and SOFS Cluster Basics

Our focus here at MPECS Inc. has grown into providing cluster-based solutions to clients near and far over the last eight years or so as well as cluster infrastructure solutions for small to medium I.T. shops.

There were so many misconceptions when we started the process to build out our first Hyper-V cluster in 2008.

The call in to us was for a large food manufacturing company that had a very specific requirement for their SQL, ColdFusion, and mail workloads to be available. The platform of choice was the Intel Modular Server with an attached Promise VTrak E310sD for extra storage.

So, off we went.

We procured all of the hardware through the Intel and Promise demo program. There was _no_ way we were going to purchase close to $100K of hardware on our own!

Back then, there was a dearth of documentation … though that hasn’t changed all that much! ;)

It took six months of trial and error plus working with the Intel, Promise, LSI, and finally contacts at Microsoft to figure out the right recipe for standing up a Hyper-V cluster.

Once we had everything down we deployed the Intel Modular Server with three nodes and the Promise VTrak E310sD for extra storage.

Node Failure

One of the first discoveries: A cluster setup does not mean the workload stays up if the node it’s on spontaneously combusts!

What does that mean? It means that when a node suddenly goes offline because of a hardware failure the guest virtual machines get moved over to an available node in a powered off state.

To the guest OS it is as if someone hit the reset button on the front of a physical server. And, as anyone that has experienced a failed node knows the first prompt when logging in to the VM is the “What caused the spontaneous restart” prompt.

Shared Storage

Every node in a Hyper-V cluster needs identical access to the storage the VHD(x) files are going to reside on.

In the early days, there really was not a lot of information indicating exactly what this meant. Especially since we decided right from day one to avoid any possible solution set based on iSCSI. Direct Attached Storage (DAS) via SAS was the way we were going to run with. The bandwidth was vastly superior with virtually no latency. No other shared storage in a cluster setting could match the numbers. And, to this day the other options still can’t match DAS based SAS solutions.

It took some time to figure out, but in the end we needed a Shared Storage License (SharedLUNKey) for the Intel Modular Server setup and a storage shelf with the needed LUN Sharing and/or LUN Masking plus LUN Sharing depending on our needs.

We had our first Hyper-V cluster!

Storage Spaces

When Storage Spaces came along in 2012 RTM we decided to venture into Clustered Storage Spaces via 2 nodes and a shared JBOD. That process took about two to three months to figure out.

Our least expensive cluster option based on this setup (blog post) is deployed at a 15 seat accounting firm. The cost versus the highly available workloads benefit ratio is really attractive. :)

We have also ventured into providing backend storage via Scale-Out File Server clusters for Hyper-V cluster frontends. Fabric between the two starts with 10GbE and SMB Multichannel.

Networking

All Broadcom and vendor rebranded Broadcom NICs require VMQ disabled for each network port!

A best practice for setting up each node is to have a minimum of four ports available. Two for the management network and Live Migration network and two for the virtual switch team. Our preference is for a pair of Intel Server Adapter i350-T4s set up as follows:

  • Port 0: Management Team (both NICs)
  • Port 1 and 2: vSwitch (no host OS access both NICs)
  • Port 3: Live Migration networks (LM0 and LM1)

For higher end setups, we install at least one Intel Server Adapter X540-T2 to bind our Live Migration network to each port. In a two node clustered Storage Spaces setting the 10GbE ports are direct connected.

Enabling Jumbo Frames is mandatory for any network switch and NIC carrying storage I/O or Live Migration.

Hardware

In our experience GHz is king over cores.

The maximum amount of memory per socket/NUMA node that can be afforded should be installed.

All components that can be should be run in pairs to eliminate as many single points of failure (SPFs) as is possible.

  • Two NICs for the networking setup
  • Two 10GbE NICs at the minimum for storage access (Hyper-V <—> SOFS),
  • Two SAS HBAs per SOFS node
  • Two power supplies per node

On the Scale-Out File Server cluster and Clustered Storage Spaces side of things one could scale up the number of JBODs to provide enclosure resilience thus protecting against a failed JBOD.

The new DataON DNS-2670 70-bay JBOD supports eight SAS ports per controller for a total of 16 SAS ports. This would allow us to scale out to eight SOFS nodes and eight JBODs using two pairs of LSI 9300-16e (PCIe 8x)  or the higher performance LSI 9302-16e (PCIe 16x) SAS HBAs per node! Would we do it? Probably not. Three or four SOFS nodes would be more than enough to serve the eight direct attached JBODs. ;)

Know Your Workloads

And finally, _know your workloads_!

Never, ever, rely on a vendor for performance data on their LoB or database backend. Always make a point of watching, scanning, and testing an already in-place solution set for performance metrics or the lack thereof. And, once baselines have been established in testing the results remain private to us.

The two key ingredients in any standalone or cluster virtualization setting are:

  1. IOPS
  2. Threads
  3. Memory

A balance must be struck between those three relative to the budget involved. It is our job to make sure our solution meets the workload requirements that have been placed before us.

Conclusion

We’ve seen a progression in the technologies we are using to deploy highly available virtualization and storage solutions.

While the technology does indeed change over time the above guidelines have stuck with us since the beginning.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book

No comments: