Tuesday, 10 January 2017

Server 2016 January 10 Update: KB3213986 – Cluster Service May Not Start Automatically Post Reboot

The January 10, 2017 update package (KB3213986) has a _huge_ caveat for those updating clusters especially with Cluster Aware Updating:

Known issues in this update:

Symptom
The Cluster Service may not start automatically on the first reboot after applying the update.

Workaround
Workaround is to either start the Cluster Service with the Start-ClusterNode PowerShell cmdlet or to reboot the node.

For those managing large cluster deploys this situation definitely leads to a need to evaluate the update procedure for this particular update.

Please keep this in mind when scheduling this particular update and have update resources set up to mitigate the problem.

Note that as of this writing, the cluster service stall on reboot is a one-time deal as far as we know. Meaning, once the update has been completed and the node has successfully joined the cluster there should be no further issues.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Wednesday, 7 December 2016

AutoDiscover “Broken” in Outlook 2016

We have a client that has their services spread across a number of different cloud systems.

Recently, users had started “losing” their connection to their hosted Exchange mailbox with Outlook coughing up some really strange errors that were not very helpful.

We ran the gamut trying to figure things out since any calls to the hosted Exchange provider and eventually the company hosting their web site always came back with “There’s a problem with Outlook”.

Indeed.

What we’ve managed to figure out is that Outlook 2016 will _always_ run an AutoDiscover check even if we’re manually setting up the mailbox for _any_ Exchange ActiveSync (EAS) connection. It must be some sort of new “security” feature in Outlook 2016.

What does that mean?

It means that when something changes unbeknownst to us things break. :(

In this case the AutoDiscover setup in Outlook for EAS connections and the web host changing something on their end as things were working for a _long_ time before the recent problems. Or, a recent update to Outlook 2016 changed things on the AutoDiscover side that revealed what was happening on the www hosting side.

Okay, back to the problem at hand. This is the prompt we would get when setting up a new mailbox, or eventually all users started getting who already had mailbox connections:

image

Internet E-mail Name@Domain.com

Enter your user name and password for the following server:

Server: gator3146.hostgator.com

Well, our mailboxes are on a third party and not HostGator. So, on to chatting and eventually phoning them after opening a ticket with the Exchange host and hearing back that the problem was elsewhere.

Unfortunately, HostGator was not very helpful via chat or phone when we initially reached out. Outlook was always the problem they claimed.

So, we set up a test mailbox on the hosted Exchange platform and went to our handy Microsoft tool: Microsoft Remote Connectivity Analyzer.

We selected the Outlook Autodiscover option and ran through the steps setting up the mailbox information, then the CAPTCHA a few times ;-), and received the following results:

image

We now had concrete evidence that HostGator was not honouring the AutoDiscover.domain.com DNS setup we had for this domain which was not on their system.

A question was sent out to a fellow MVP on Exchange and their reply back was “HostGator had a URLReWrite rule in place for IIS/Apache that was grabbing the AutoDiscover polls from Outlook and sending them to their own servers.”

During that time we created the /AutoDiscover folder and put a test file in it. The problem still happened.

Okay, back on the phone with HostGator support. The first call had two “escalations” associated with it unfortunately with no results. A second call was made after seeing the MVP response with a specific request to HostGator: Delete the URLReWrite rule that was set up on this client’s site within the last month.

They could not do it. Nothing. Nada. Zippo. :(

So, for now our workaround was to move the DNS A record for @ (Domain.com) to the same IP as the hosted Exchange service’s AutoDiscover IP to at least get Outlook to fail on the initial domain poll.

Moral of the story?

We’re moving all of our client’s web properties off HostGator to a hosting company that will honour the setup we implement and use the Microsoft Remote Connectivity Analyzer to test things out thoroughly.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 22 November 2016

Something to be Thankful For

There are many things to be grateful for. For us, it's our family, friends, business, and so much more.

This last weekend we were reminded in a not so subtle way just how fragile life can be.

image

The other driver, two of my kids, and myself were all very fortunate to walk away with no bones broken or blood spilled. We had a big grug later in the day when we are all finally back together at home.

Dealing with the soreness and migraines since the accident are a small price to pay for the fact that we are all okay.

And fortunately, the other driver took full responsibility for the critical error in judgement that caused the accident so no insurance scrambles will be dealt with.

We are truly thankful to be alive today.

Have a great Thanksgiving to our US neighbours. And for everyone, give those special folks in life a hug I sure have been! ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 15 November 2016

What’s in a Lab? Profit!

Our previous post on Server Hardware: The Data Bus is Playing Catch-Up has had a lot of traction.

Our tweets on I.T. companies not having a lab for their solutions sales engineers and technicians has had a lot of traction.

So, let’s move forward with a rather blunt opinion piece shall we? ;)

What client wants to drop $25K on an 800bhp blown 454CID engine then shovel it in to that Vega/Monza only to find the car twisted into a pretzel on the first run and very possibly the driver with serious injuries or worse?

image

Image credit

Seriously, why wouldn’t the same question be asked by a prospect or client that is about to drop $95K or more on a Storage Spaces Direct (S2D) cluster that the I.T. provider has _never_ worked with? Does the client or prospect even think of asking that question? Are there any references with that solution in production? If the answer is “No” then get the chicken out of that house!

In the automotive industry folks ask those questions especially when they have some serious coin tied up in the project … at least we believe they would based on previous experience.

Note that there are a plethora of videos on YouTube and elsewhere showing the results of so-called “tuners” blowing the bottom end out of an already expensive engine. :P

In all seriousness though, how can an I.T. company sell a solution to a client that they’ve never worked with, put together, tested, or even _seen_ before?

It really surprised me to be chatting with a technical architect that works for a large I.T. provider when they told me their company doesn’t believe there is any value in providing a lab for them.

S2D Lab Setup

A company that keeps a lab, refreshes it every so often, stands to gain so much more than folks that count the beans may see.

For S2D, the following is a good place, and inexpensive, to start:

  • Typical 4-node S2D lab based on Intel Server Systems
    • R2224WTTYSR Servers: $15K each
    • Storage
      • Intel 750 Series NVMe $1K/Node
      • Intel 3700 Series SATA $2K/Node
      • Seagate/HGST Spindles $3K/Node
    • Mellanox RDMA Networking: $18K (MSX1012X + 10GbE CX-3 Adapters)
    • NETGEAR 10GbE Networking: $4K (XS716T + X540-T2 or X550-T2)
    • Cost: ~$75K to $85K

The setup should look something like this:

image

S2D Lab (Front)

image

S2D Lab (Rear)

Note that we have two extra nodes for a Hyper-V cluster setup to work with S2D as a SOFS only solution.

Okay, so the bean counters are saying, “what do we get for our $100K hmmm?”

Point 1: We’ve Done It

The above racked systems images go into any S2D Proposal with an explanation that we’ve been building these hyper-converged clusters since Windows Server 2016 was in its early technical preview days. The prospect that sees the section outlining our efforts to fine tune our solutions on our own dime places our competitors at a huge disadvantage.

Point 2: References

With our digging in and testing from the outset we would be bidding on deals with these solutions. As a result, we are one of the few with go-to-market ready solutions and will have deployed them before most others out there even know what S2D is!

Point 3: Killer and Flexible Performance

Most solutions we would be bidding against are traditional SAN style configurations. Our hyper-converged S2D platform provides a huge step up over these solutions in so many ways:

  1. IOPS: NVMe utilized at the cache layer for real IOPS gains over traditional SAN either via Fibre Channel or especially iSCSI.
  2. Throughput: Our storage can be set up to run huge amounts of data through the pipe if required.
  3. Scalability: We can start off small and scale out up to 16 nodes per cluster.
    • 2-8 nodes @ 10GbE RDMA via Mellanox and RoCEv2
    • 8-16 nodes @ 40GbE RDMA via Mellanox and RoCEv2
      • Or, 100GbE RDMA via Mellanox and RoCEv2

This begs the question: How does one know how one’s solution is going to perform if one has never deployed it before?

Oh, we know: “I’ve read it in Server’s Reports”, says the lead sales engineer. ;)

Point 4: Point of Principle

It has been mentioned here before: We would never,_ever_, deploy a solution that we’ve not worked with directly.

Why?

For one, because we want to make sure our solution would fulfil the promises we’ve made around it. We don’t want to be called to come and pick up our high availability solution because it does not do what it was supposed to do. We’ve heard of that happening for some rather expensive solutions from other vendors.

Point 5: Reputation

Our prospects can see that we have a history, and a rather long one at that, of digging in quite deep both in our own pockets but also of our own time to develop our solution sets. That also tells them that we are passionate about the solutions we propose.

We _are_ Server’s Reports so we don’t need to rely on any third party for a frame of reference! ;)

Conclusion

Finally, an I.T. company that invests in their crew both in lab kit, time, training, and mentorship will find their crew quite passionate about the solutions they are selling and working with. That translates into sales but also happy clients that can see for themselves that they are getting a great value for their I.T. dollars.

I.T. Services Companies get and maintain a lab! It is worth it!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Saturday, 12 November 2016

Server Hardware: The Data Bus is Playing Catch-Up

After seeing the Mellanox ConnectX-6 200Gb announcement the following image came to mind:

image

Image credit

The Vega/Monza was a small car that some folks found the time to stuff a 454CID Chevy engine into then drop a 671 or 871 series roots blower on (they came off trucks back in the day). The driveline and the "frame" were then tweaked to accommodate it.

The moral of the story? It was great to have all of that power but putting down to the road was always a problem. Check out some of the "tubbed" Vega images out there to see a few of the ways to do so.

Our server hardware today does not, unfortunately, have the ability to be "tubbed" to allow us to get things moving.

PCI Express

The PCI Express (PCIe) v3 spec (Wikipedia) at a little over 15GB/Second (that's Gigabytes not Gigabits) across a 16 lane connector falls far short of the needed bandwidth for a dual port 100Gb ConnectX-5 part.

As a point of reference, the theoretical throughput of one 100Gb port is about 12.5GB/Second. That essentially renders the dual port ConnectX-5 adapter a moot point as that second port has very little left for it to use. So, it becomes essentially a "passive" port to a second switch for redundancy.

A quick search for "Intel Server Systems PCIe Gen 4" yields very little in the way of results. We know we are about due for a hardware step as the "R" code (meaning refresh such as R2224WTTYSR) is coming into its second to third year in 2017.

Note that the current Intel Xeon Processor E5-2600 v4 series only has a grand total of 40 PCI Express Generation 3 lanes available. Toss in two PCIe x16 wired lanes with two ConnectX-4 100Gb adapters and that's going to be about it for real throughput.

Connectivity fabric bandwidth outside the data bus is increasing in leaps and bounds. Storage technologies such as NVMe and now NVDIMM-N, 3D XPoint, and other such memory bus direct storage technologies are either centre stage or coming on to the stage.

The current PCIe v3 pipe is way too small. The fourth generation PCI Express pipe that is not even in production is _already_ too small! It's either time for an entirely new bus fabric or a transitioning of the memory bus into either a full or intermediate storage bus which is what NVDIMM-N and 3D XPoint are hinting at.

Oh, and one more tiny point: Drawing storage into the memory bus virtually eliminates latency ... almost.

Today's Solutions

Finally, one needs to keep in mind that the server platforms we are deploying on today have very specific limitations. We've already hit some limits in our performance testing (blog post: Storage Configuration: Know Your Workloads for IOPS or Throughput).

With our S2D solutions looking to three, five, or more years of service life these limitations _must_ be at the forefront of our thought process when in discovery and then solution planning.

If not, we stand to have an unhappy customer calling us to take the solution back after we deploy or a call a year or two down the road when they hit the limits.

***

Author's Note: I was just shy of my Journeyman's ticket as a mechanic, in a direction towards high-performance, when the computer bug bit me. ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Monday, 24 October 2016

Windows Server 2016 Feature Comparison Summary

This is a direct snip of the differences between Windows Server Standard and Datacenter from this PDF: image
image
image
image

Game Changing

A _lot_ of folks use the words “game changing” to somehow set their productions and/or service apart from others. Most of the time when we hear those words the products and/or services are “ho-hum” with little to no real impact on a company’s or user’s day-to-day work/personal life.
We’ve heard those words in our industry _a lot_ over the last number of years. The reality has been somewhat different for most of us.
We believe, based on our experience deploying inexpensive storage (clustered Scale-Out File Server/Storage Spaces) and compute (clustered Hyper-V) high availability solutions that Windows Server 2016 is indeed _game changing_ on three fronts:
  1. Compute
  2. Storage
  3. Networking
The Software Defined Data Centre (SDDC) story in Windows Server 2016 has traditional cluster, storage, and networking vendors concerned. In our opinion, deeply concerned. Just watch key vendor’s stocks, as we’ve been doing these last four or five years, to see just how much of an impact the Server 2016 SDDC story will have over the next two to five years. Some stocks have already reflected the inroads Hyper-V and recently (2 years) Storage Spaces have made into their markets. We’re a part of that story! :)
Using true commodity hardware we are able to set up an entire SDDC for a fraction of the cost of traditional data centre solutions. Not only that, with the Hyper-Converged Storage Spaces Direct platform we can provide lots of IOPS and compute in a small footprint without all of the added complexity and expense in traditional data centre solutions.

First Server 2016 Deployment

We’ve already deployed our first Windows Server 2016 Clustered Storage Spaces cluster on the General Availability (GA) bits while in Las Vegas two weeks ago:
image
That’s two Intel Server Systems R1208JP4OC 1U servers and a Quanta QCT JB4602 JBOD outfitted with (6) 200GB HGST SAS SSDs and (54) 8GB 8TB Seagate NearLine SAS drives. We are setting up for a Parity space to provide maximum storage availability as this client produces lots of 4K video. (EDIT NOTE: Updated the drive size)
Cost of the highly available storage solution is a fraction of the cost we’d see from Tier 1 storage or hardware vendors.

Going Forward

It’s no secret that we are excited about the Server 2016 story. We plan on posting a lot more about why we believe Windows Server 2016 is a game changer with the specifics around the above mentioned three areas. We may even mention some of the vendor’s stock tickers to add to your watch list too! ;)

Who is MPECS Inc.?

A bit of a shameless plug.
We’ve been building SDDC setups for small to medium hosting companies along with SMB/SME consultants and clients that are concerned about “Cloud being their data on someone else’s computer” since 2008/2009.
Our high availability solutions are designed with SMB (think sub $12K for a 2-node cluster) and SME (think sub $35K cluster) in mind along with SMB/SME focused hosting providers (think sub $50K to start). Our solutions are flexible and can be designed with the 3 year and 5 year stories, or more, in mind.
We run our Cloud on-premises, hybrid, or in _our_ Cloud that runs on _our_ highly available solutions.
Curious how we can help you? Then, please feel free to ask!
Have a great day and thanks for reading.
Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 18 October 2016

Some Thoughts on the New Windows Server Per Core Licensing Model

This is a post sent to the SBS2K Yahoo Group.

***

Let’s take things from this perspective: Nothing has changed at our level.

How’s that?

We deploy four, six, and eight core single socket and dual socket servers.

We have not seen the need for four socket servers since one can set things up today with eight processors in one 2U space (4 nodes) without too much finagling and end up with way more performance.

Until Intel has conquered the GHz to core count ratio we will be deploying high GHz low core count CPUs for the time being. Not that eight cores is a “low” count in our world. ;)

Most of us will never see the need to set up a dual socket server with more than 12 cores with 16 being not as common a setup for us.

Our sweet spot right now in dual socket server cluster nodes is the E5-2643v4 at 6 cores and 3.4 GHz. For higher intensity workloads we run with the E5-2667v4 at 8 cores and 3.2 GHz. Price wise, these are the best bang for the buck relative to core count versus GHz.

With the introduction of the two node configuration for Storage Spaces Direct (S2D) we have an option to provide high availability (HA) in our smaller clients using two single socket 1U servers (R1208SPOSHOR or Dell R330) for a very reasonable cost. A Datacenter license is required for each node. Folks may balk at that, but keep this in mind:

clip_image001

What does that mean? It means that we can SPLA the Datacenter license whether the client leases the equipment, which was standard fair back in the day, or they own it but we SaaS the Windows Server licenses. Up here in Canada those licenses are about $175 per 16 cores. That’s $350/Month for a HA setup. We see a _huge_ market for this setup in SMB and SME. Oh, and keep in mind that we can then be very flexible about our VM layout. ;)

The licensing change reminds me of VMware’s changes a number of years back where they received so much backpressure that the “Core Tax” changes got reverted. So far, there’s not been a lot of backpressure that we’ve seen about this change. But then, the bulk of the on-premises SMB/SME world, where we spend most of our time, don’t deploy servers with more than 16 cores.

In the end, as I stated at the beginning, nothing has changed for us.

We’re still deploying “two” Server Standard licenses, now in this case with 16 cores per server, for our single box solutions with four VMs. And, we’re deploying “four” Server Standard licenses four our two node Clustered Storage Spaces and Hyper-V via shared storage that also yields four VMs in that setting.

If, and when, we broach into the higher density core counts for our cluster setups, or even standalone boxes, we will cross that bridge when we come to it.

Have a great day everyone and thanks for reading. :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service