Tuesday, 22 November 2016

Something to be Thankful For

There are many things to be grateful for. For us, it's our family, friends, business, and so much more.

This last weekend we were reminded in a not so subtle way just how fragile life can be.

image

The other driver, two of my kids, and myself were all very fortunate to walk away with no bones broken or blood spilled. We had a big grug later in the day when we are all finally back together at home.

Dealing with the soreness and migraines since the accident are a small price to pay for the fact that we are all okay.

And fortunately, the other driver took full responsibility for the critical error in judgement that caused the accident so no insurance scrambles will be dealt with.

We are truly thankful to be alive today.

Have a great Thanksgiving to our US neighbours. And for everyone, give those special folks in life a hug I sure have been! ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 15 November 2016

What’s in a Lab? Profit!

Our previous post on Server Hardware: The Data Bus is Playing Catch-Up has had a lot of traction.

Our tweets on I.T. companies not having a lab for their solutions sales engineers and technicians has had a lot of traction.

So, let’s move forward with a rather blunt opinion piece shall we? ;)

What client wants to drop $25K on an 800bhp blown 454CID engine then shovel it in to that Vega/Monza only to find the car twisted into a pretzel on the first run and very possibly the driver with serious injuries or worse?

image

Image credit

Seriously, why wouldn’t the same question be asked by a prospect or client that is about to drop $95K or more on a Storage Spaces Direct (S2D) cluster that the I.T. provider has _never_ worked with? Does the client or prospect even think of asking that question? Are there any references with that solution in production? If the answer is “No” then get the chicken out of that house!

In the automotive industry folks ask those questions especially when they have some serious coin tied up in the project … at least we believe they would based on previous experience.

Note that there are a plethora of videos on YouTube and elsewhere showing the results of so-called “tuners” blowing the bottom end out of an already expensive engine. :P

In all seriousness though, how can an I.T. company sell a solution to a client that they’ve never worked with, put together, tested, or even _seen_ before?

It really surprised me to be chatting with a technical architect that works for a large I.T. provider when they told me their company doesn’t believe there is any value in providing a lab for them.

S2D Lab Setup

A company that keeps a lab, refreshes it every so often, stands to gain so much more than folks that count the beans may see.

For S2D, the following is a good place, and inexpensive, to start:

  • Typical 4-node S2D lab based on Intel Server Systems
    • R2224WTTYSR Servers: $15K each
    • Storage
      • Intel 750 Series NVMe $1K/Node
      • Intel 3700 Series SATA $2K/Node
      • Seagate/HGST Spindles $3K/Node
    • Mellanox RDMA Networking: $18K (MSX1012X + 10GbE CX-3 Adapters)
    • NETGEAR 10GbE Networking: $4K (XS716T + X540-T2 or X550-T2)
    • Cost: ~$75K to $85K

The setup should look something like this:

image

S2D Lab (Front)

image

S2D Lab (Rear)

Note that we have two extra nodes for a Hyper-V cluster setup to work with S2D as a SOFS only solution.

Okay, so the bean counters are saying, “what do we get for our $100K hmmm?”

Point 1: We’ve Done It

The above racked systems images go into any S2D Proposal with an explanation that we’ve been building these hyper-converged clusters since Windows Server 2016 was in its early technical preview days. The prospect that sees the section outlining our efforts to fine tune our solutions on our own dime places our competitors at a huge disadvantage.

Point 2: References

With our digging in and testing from the outset we would be bidding on deals with these solutions. As a result, we are one of the few with go-to-market ready solutions and will have deployed them before most others out there even know what S2D is!

Point 3: Killer and Flexible Performance

Most solutions we would be bidding against are traditional SAN style configurations. Our hyper-converged S2D platform provides a huge step up over these solutions in so many ways:

  1. IOPS: NVMe utilized at the cache layer for real IOPS gains over traditional SAN either via Fibre Channel or especially iSCSI.
  2. Throughput: Our storage can be set up to run huge amounts of data through the pipe if required.
  3. Scalability: We can start off small and scale out up to 16 nodes per cluster.
    • 2-8 nodes @ 10GbE RDMA via Mellanox and RoCEv2
    • 8-16 nodes @ 40GbE RDMA via Mellanox and RoCEv2
      • Or, 100GbE RDMA via Mellanox and RoCEv2

This begs the question: How does one know how one’s solution is going to perform if one has never deployed it before?

Oh, we know: “I’ve read it in Server’s Reports”, says the lead sales engineer. ;)

Point 4: Point of Principle

It has been mentioned here before: We would never,_ever_, deploy a solution that we’ve not worked with directly.

Why?

For one, because we want to make sure our solution would fulfil the promises we’ve made around it. We don’t want to be called to come and pick up our high availability solution because it does not do what it was supposed to do. We’ve heard of that happening for some rather expensive solutions from other vendors.

Point 5: Reputation

Our prospects can see that we have a history, and a rather long one at that, of digging in quite deep both in our own pockets but also of our own time to develop our solution sets. That also tells them that we are passionate about the solutions we propose.

We _are_ Server’s Reports so we don’t need to rely on any third party for a frame of reference! ;)

Conclusion

Finally, an I.T. company that invests in their crew both in lab kit, time, training, and mentorship will find their crew quite passionate about the solutions they are selling and working with. That translates into sales but also happy clients that can see for themselves that they are getting a great value for their I.T. dollars.

I.T. Services Companies get and maintain a lab! It is worth it!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Saturday, 12 November 2016

Server Hardware: The Data Bus is Playing Catch-Up

After seeing the Mellanox ConnectX-6 200Gb announcement the following image came to mind:

image

Image credit

The Vega/Monza was a small car that some folks found the time to stuff a 454CID Chevy engine into then drop a 671 or 871 series roots blower on (they came off trucks back in the day). The driveline and the "frame" were then tweaked to accommodate it.

The moral of the story? It was great to have all of that power but putting down to the road was always a problem. Check out some of the "tubbed" Vega images out there to see a few of the ways to do so.

Our server hardware today does not, unfortunately, have the ability to be "tubbed" to allow us to get things moving.

PCI Express

The PCI Express (PCIe) v3 spec (Wikipedia) at a little over 15GB/Second (that's Gigabytes not Gigabits) across a 16 lane connector falls far short of the needed bandwidth for a dual port 100Gb ConnectX-5 part.

As a point of reference, the theoretical throughput of one 100Gb port is about 12.5GB/Second. That essentially renders the dual port ConnectX-5 adapter a moot point as that second port has very little left for it to use. So, it becomes essentially a "passive" port to a second switch for redundancy.

A quick search for "Intel Server Systems PCIe Gen 4" yields very little in the way of results. We know we are about due for a hardware step as the "R" code (meaning refresh such as R2224WTTYSR) is coming into its second to third year in 2017.

Note that the current Intel Xeon Processor E5-2600 v4 series only has a grand total of 40 PCI Express Generation 3 lanes available. Toss in two PCIe x16 wired lanes with two ConnectX-4 100Gb adapters and that's going to be about it for real throughput.

Connectivity fabric bandwidth outside the data bus is increasing in leaps and bounds. Storage technologies such as NVMe and now NVDIMM-N, 3D XPoint, and other such memory bus direct storage technologies are either centre stage or coming on to the stage.

The current PCIe v3 pipe is way too small. The fourth generation PCI Express pipe that is not even in production is _already_ too small! It's either time for an entirely new bus fabric or a transitioning of the memory bus into either a full or intermediate storage bus which is what NVDIMM-N and 3D XPoint are hinting at.

Oh, and one more tiny point: Drawing storage into the memory bus virtually eliminates latency ... almost.

Today's Solutions

Finally, one needs to keep in mind that the server platforms we are deploying on today have very specific limitations. We've already hit some limits in our performance testing (blog post: Storage Configuration: Know Your Workloads for IOPS or Throughput).

With our S2D solutions looking to three, five, or more years of service life these limitations _must_ be at the forefront of our thought process when in discovery and then solution planning.

If not, we stand to have an unhappy customer calling us to take the solution back after we deploy or a call a year or two down the road when they hit the limits.

***

Author's Note: I was just shy of my Journeyman's ticket as a mechanic, in a direction towards high-performance, when the computer bug bit me. ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Monday, 24 October 2016

Windows Server 2016 Feature Comparison Summary

This is a direct snip of the differences between Windows Server Standard and Datacenter from this PDF: image
image
image
image

Game Changing

A _lot_ of folks use the words “game changing” to somehow set their productions and/or service apart from others. Most of the time when we hear those words the products and/or services are “ho-hum” with little to no real impact on a company’s or user’s day-to-day work/personal life.
We’ve heard those words in our industry _a lot_ over the last number of years. The reality has been somewhat different for most of us.
We believe, based on our experience deploying inexpensive storage (clustered Scale-Out File Server/Storage Spaces) and compute (clustered Hyper-V) high availability solutions that Windows Server 2016 is indeed _game changing_ on three fronts:
  1. Compute
  2. Storage
  3. Networking
The Software Defined Data Centre (SDDC) story in Windows Server 2016 has traditional cluster, storage, and networking vendors concerned. In our opinion, deeply concerned. Just watch key vendor’s stocks, as we’ve been doing these last four or five years, to see just how much of an impact the Server 2016 SDDC story will have over the next two to five years. Some stocks have already reflected the inroads Hyper-V and recently (2 years) Storage Spaces have made into their markets. We’re a part of that story! :)
Using true commodity hardware we are able to set up an entire SDDC for a fraction of the cost of traditional data centre solutions. Not only that, with the Hyper-Converged Storage Spaces Direct platform we can provide lots of IOPS and compute in a small footprint without all of the added complexity and expense in traditional data centre solutions.

First Server 2016 Deployment

We’ve already deployed our first Windows Server 2016 Clustered Storage Spaces cluster on the General Availability (GA) bits while in Las Vegas two weeks ago:
image
That’s two Intel Server Systems R1208JP4OC 1U servers and a Quanta QCT JB4602 JBOD outfitted with (6) 200GB HGST SAS SSDs and (54) 8GB 8TB Seagate NearLine SAS drives. We are setting up for a Parity space to provide maximum storage availability as this client produces lots of 4K video. (EDIT NOTE: Updated the drive size)
Cost of the highly available storage solution is a fraction of the cost we’d see from Tier 1 storage or hardware vendors.

Going Forward

It’s no secret that we are excited about the Server 2016 story. We plan on posting a lot more about why we believe Windows Server 2016 is a game changer with the specifics around the above mentioned three areas. We may even mention some of the vendor’s stock tickers to add to your watch list too! ;)

Who is MPECS Inc.?

A bit of a shameless plug.
We’ve been building SDDC setups for small to medium hosting companies along with SMB/SME consultants and clients that are concerned about “Cloud being their data on someone else’s computer” since 2008/2009.
Our high availability solutions are designed with SMB (think sub $12K for a 2-node cluster) and SME (think sub $35K cluster) in mind along with SMB/SME focused hosting providers (think sub $50K to start). Our solutions are flexible and can be designed with the 3 year and 5 year stories, or more, in mind.
We run our Cloud on-premises, hybrid, or in _our_ Cloud that runs on _our_ highly available solutions.
Curious how we can help you? Then, please feel free to ask!
Have a great day and thanks for reading.
Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 18 October 2016

Some Thoughts on the New Windows Server Per Core Licensing Model

This is a post sent to the SBS2K Yahoo Group.

***

Let’s take things from this perspective: Nothing has changed at our level.

How’s that?

We deploy four, six, and eight core single socket and dual socket servers.

We have not seen the need for four socket servers since one can set things up today with eight processors in one 2U space (4 nodes) without too much finagling and end up with way more performance.

Until Intel has conquered the GHz to core count ratio we will be deploying high GHz low core count CPUs for the time being. Not that eight cores is a “low” count in our world. ;)

Most of us will never see the need to set up a dual socket server with more than 12 cores with 16 being not as common a setup for us.

Our sweet spot right now in dual socket server cluster nodes is the E5-2643v4 at 6 cores and 3.4 GHz. For higher intensity workloads we run with the E5-2667v4 at 8 cores and 3.2 GHz. Price wise, these are the best bang for the buck relative to core count versus GHz.

With the introduction of the two node configuration for Storage Spaces Direct (S2D) we have an option to provide high availability (HA) in our smaller clients using two single socket 1U servers (R1208SPOSHOR or Dell R330) for a very reasonable cost. A Datacenter license is required for each node. Folks may balk at that, but keep this in mind:

clip_image001

What does that mean? It means that we can SPLA the Datacenter license whether the client leases the equipment, which was standard fair back in the day, or they own it but we SaaS the Windows Server licenses. Up here in Canada those licenses are about $175 per 16 cores. That’s $350/Month for a HA setup. We see a _huge_ market for this setup in SMB and SME. Oh, and keep in mind that we can then be very flexible about our VM layout. ;)

The licensing change reminds me of VMware’s changes a number of years back where they received so much backpressure that the “Core Tax” changes got reverted. So far, there’s not been a lot of backpressure that we’ve seen about this change. But then, the bulk of the on-premises SMB/SME world, where we spend most of our time, don’t deploy servers with more than 16 cores.

In the end, as I stated at the beginning, nothing has changed for us.

We’re still deploying “two” Server Standard licenses, now in this case with 16 cores per server, for our single box solutions with four VMs. And, we’re deploying “four” Server Standard licenses four our two node Clustered Storage Spaces and Hyper-V via shared storage that also yields four VMs in that setting.

If, and when, we broach into the higher density core counts for our cluster setups, or even standalone boxes, we will cross that bridge when we come to it.

Have a great day everyone and thanks for reading. :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 20 September 2016

Windows Server 2016: Storage Spaces Direct (S2D) and VMware Virtual SAN IOPS?

We’re not VMware experts here. Let’s make that very clear. 

However, we do have the ear of many a VMware expert so we are able to ask: Is the Windows Server 2016 hyper-converged Storage Spaces Direct (S2D) setup similar to the VMware Virtual SAN platform? Apples to Apples if you will?

The resounding answer has been, “Yes!”

That leads us to the following articles that are about Intel NVMe all-flash hyper-converged setups:

  1. Record Performance of All Flash NVMe Configuration – Windows Server 2016 and Storage Spaces Direct - IT Peer Network
    • An Intel based four node S2D setup
  2. VMware Virtual SAN 6.2 Sets New Record - IT Peer Network
    • A VMware based 8 node Virtual SAN 6.2 setup

What’s interesting to note is the following:

  1. Intel/S2D setup at 4 nodes
    • 3.2M IOPS at 4K Read
      • 930K IOPS at 70/30 Read/Write
  2. VMware setup at 8 nodes
    1. 1.2M IOPS @ Capacity Mode
      • 500K IOPS at 70/30 Read/Write
    2. ~830K IOPS @ Balance Mode
      • 800K IOPS at 70/30 Read/Write

What is starting to become apparent, at least to us, is that the folks over at the newly minted Dell EMC _really_ need to pay attention to Microsoft’s feature depth in Windows Server 2016. In fact, that goes without saying for all vendors into virtualization, high performance computing (HPC), storage, and data centre products.

We’re pretty excited about the hyper-converged S2D feature set in Windows Server 2016. So much so, we have invested quite heavily in our Proof-of-Concept (PoC).

image

The bill of materials (BoM) for the setup so far:

  • S2D Nodes (4)
    • Intel Server Systems R2224WTTYS with dual E5-2640, 256GB ECC, Intel JBOD HBA, Intel X540-T2 10GbE, Mellanox 56GbE NICs, Intel PCIe 750 NVMe, Intel SATA SSDs, and Seagate 10K SAS spindles
  • Hyper-V Nodes (2)
    • Intel Server Systems R2208GZ4GC with dual E5-2650, 128GB ECC, Intel X540-T2 10GbE, and Mellanox 56GbE NICs
  • Lab DC
    • An Intel Server setup including S1200BT series board and Intel Xeon E3 processor.

Our testing includes using the S2D setup as a hyper-converged platform but also as a Scale-Out File Server (SOFS) cluster destination for a Hyper-V cluster on the two Hyper-V nodes. Then, some testing of various configurations beyond.

We believe Windows Server 2016 is looking to be one of the best server operating systems Microsoft has ever released. Hopefully we won’t be seeing any major bugs in the production version!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Our SMB/SME HA Solution
Our Cloud Service

Tuesday, 26 July 2016

Some Disaster Recovery Planning On-Premises, Hybrid, and Cloud Thoughts

This was a post to the SBS2K Yahoo list in response to a comment about the risks of encrypting all of our domain controllers (which we have been moving towards for a year or two now). It’s been tweaked for this blog post.

***

We’ve been moving to 100% encryption in all of our standalone and cluster settings.

Encrypting a setup does not change _anything_ as far as Disaster Recovery Plans go. Nothing. Period.

The “something can go wrong there” attitude should apply to everything from on-premises storage (we’ve been working with a firm that had Gigabytes/Terabytes of data lost due to the previous MSP’s failures) and services to Cloud resident data and services.

No stone should be left unturned when it comes to backing up data and Disaster Recovery Planning. None. Nada. Zippo. Zilch.

The new paradigm from Microsoft and others has migrated to “Hybrid” … for the moment. Do we have a backup of the cloud data and services? Is that backup air-gapped?

Google lost over 150K mailboxes a number of years back, we worked with one panicked call who lost everything, with no return. What happens then?

Recently, a UK VPS provider had a serious crash and, as it turns out lost _a lot_ of data. Where are their clients now? Where’s their client’s business after such a catastrophic loss?

Some on-premises versus cloud based backup experiences:

  • Veeam/ShadowProtect On-Premises: Air-gapped (no user access to avoid *Locker problems), encrypted, off-site rotated, and high performance recovery = Great.
  • Full recovery from the Cloud = Dismal.
  • Partial recovery of large files/numerous files/folders from the Cloud = Dismal.
  • Garbage In = Garbage Out = Cloud backup gets the botched bits in a *Locker event.
  • Cloud provider’s DC goes down = What then?
  • Cloud provider’s Services hit a wall and failover fails = What then (this was a part of Google’s earlier mentioned problem me thinks)?
    • ***Remember, we’re talking Data Centers on a grand scale where failover testing has been done?!?***
  • At Scale:
    • Cloud/Mail/Services providers rely on a myriad of systems to provide resilience
      • Most Cloud providers rely on those systems to keep things going
    • Backups?
      • Static, air-gapped backups?
      • “Off-Site” backups?
        • These do not, IMO, exist at scale
  • The BIG question: Does the Cloud service provider have a built-in backup facility?
    • Back up the data to local drive or NAS either manually or via schedule
    • Offer a virtual machine backup off their cloud service

There is an assumption, and we all know what that means right?, that seems to be prevalent among top tier cloud providers that their resiliency systems will be enough to protect them from that next big bang. But, has it? We seem to already have examples of the “not”.

In conclusion to this rather long winded post I can say this: It is up to us, our client’s trusted advisors, to make bl**dy well sure our client’s data and services are properly protected and that a down-to-earth backup exists of their cloud services/data.

We really don’t enjoy being on the other end of a phone call “OMG, my data’s gone, the service is offline, and I can’t get anywhere without it!” :(

Oh, and BTW, our SBS 2003/2008/2011 Standard/Premium sites all had 100% Uptime across YEARS of service. :P

We did have one exception in there due to an inability to cool the server closet as the A/C panel was full. Plus, the building’s HVAC had a bunch of open primary push ports (hot in winter cold in summer) above the ceiling tiles which is where the return air is supposed to happen. In the winter the server closet would hit +40C for long periods of time as the heat would settle into that area. ShadowProtect played a huge role in keeping this firm going plus technology changes over server refreshes helped (cooler running processors and our move to SAS drives).

*** 

Some further thoughts and references in addition to the above forum post.

The moral of this story is quite simple. Make sure _all_ data is backed up and air-gapped. Period.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service