Wednesday, 15 March 2017

Windows Server 2016 March 2017 Update: Full & Delta Available

We can download both the full March Cumulative Update or there is now a Delta available.

image

Delta Update Windows Server 2016

The update is quite critical for those of us that run clusters on Windows Server 2016.

  • Addresses issue which could cause ReFS metadata corruption
  • Several fixes for Enable-ClusterS2D cmdlet for setting up Storage Spaces Direct
  • Addresses issue with Update-ClusterFunctionalLevel cmdlet during rolling upgrades if any of the default resource types are not registered
  • Optimization to the ordering when draining a S2D node with Storage Maintenance Mode
  • Addresses servicing issue where the Cluster Service may not start automatically on the first reboot after applying an update
  • Improved the bandwidth of SSD/NVMe drives available to application workloads during S2D rebuild operations.
  • Addresses issue with all flash S2D systems with cache devices where there was unnecessary read data from both tiers that would degrade performance

A full list is here: March 14, 2017—KB4013429 (OS Build 14393.953)

The Delta Update can be used to update our .WIM files for our Windows Server 2016 flash drive based installers (Blog post How-To).

Note that the last Cumulative Update took a good hour to run on our VMs and nodes. This one is sounding like it may take as long or longer depending on whether the Delta of full Cumulative Update gets installed.

Here’s a direct link to the Microsoft Update Page for KB4013429.

Happy Patching! ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Monday, 13 February 2017

Installing Windows: Updating Drivers for Boot.WIM and Install.WIM

We have a number of clients that are going to be on Windows 7 Enterprise 64-bit for the foreseeable future.

Most, if not all, of the laptops we are deploying today require a few BIOS tweaks to turn off uEFI and SecureBoot prior to beginning the setup process.

Once that is done, the laptops we are deploying today have wired and wireless network, chipsets, and USB 3 ports that have no drivers in the Windows 7 Boot.WIM or Install.WIM files. So, we would be left stranded when trying to deploy an operating system (OS) via flash!

Given the amount of application updates most of our clients have we decided to avoid using MDT or other imaging software to deploy a new laptop. The time savings would be negligible since we’d be stuck running all of the application updates or new installs post OS deployment anyway.

Also, when installing an operating system via USB 3 flash drive with decent read speeds it only takes a few minutes to get through the base OS install. Windows 10 can have an entire OS install done in a few minutes.

The following instructions assume the files have already been extracted to a bootable flash drive to use for installing an OS on a new machine.

Here’s a simple step-by-step for updating the drivers for a Windows 7 Professional Install.WIM:

  1. Dism /Get-WimInfo /WimFile:D:\sources\install.wim
    • Where D: = the flash drive letter
  2. Dism /Mount-Wim /WimFile:D:\sources\install.wim /Name:"Windows 7 PROFESSIONAL" /MountDir:C:\Mount
    • Change C: to another drive/partition if required
    • NOTE: Do not browse the contents of this folder!
  3. Dism /Image:L:\mount /Add-Driver /Driver:C:\Mount_Driver /Recurse
    • Again, change C: to the required drive letter
    • We extract all drivers to be installed or updated to this folder
  4. Dism /unmount-Wim /mountdir:L:\mount /commit
    • This step will commit all of the changes to the .WIM file
    • NOTE: Make sure there are _no_ Windows/File Explorer, CMD, or PowerShell sessions sitting in the C:\Mount folder or the dismount will fail!
  5. Dism /unmount-Wim /mountdir:L:\mount /Discard
    1. Run this command to dismount the .WIM

The following is the process for updating the WinPE Boot.WIM:

  1. Dism /Get-WimInfo /WimFile:D:\sources\boot.wim
  2. Dism /Mount-Wim /WimFile:D:\sources\boot.wim /Name:"Microsoft Windows Setup (x64)" /MountDir:C:\Mount
  3. Dism /Image:C:\mount /Add-Driver /Driver:C:\Mount_Driver /Recurse
  4. Dism /unmount-Wim /mountdir:C:\mount /commit

The above process can be used to install the newest RAID driver into a Windows Server Boot.WIM and Install.WIM to facilitate a smoother install via flash drive.

Currently, we are using Kingston DTR3.0 G2 16GB flash drives as they have good read and decent write speeds. Please feel free to comment with your suggestions on reasonably priced 16GB and 32GB USB 3 flash drives that have good read and write speeds.

Good to us is ~75MB/Second read and ~35MB/Second write.

Thanks for reading! :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Saturday, 4 February 2017

Hyper-V Compute, Storage Spaces Storage, and S2D Hyper-Converged Solutions

Lately, we here at MPECS Inc. have been designing, implementing, servicing, and supporting highly available solutions for hyper-converged, Hyper-V compute, Storage Spaces storage, and lab environments along with standalone Hyper-V server solutions.

Here are some of the things we have been working on recently or have deployed within the last year.

Cluster Solutions

As has been posted here on our blog previously, we have invested heavily in the Windows Server 2016 story especially in Storage Spaces Direct (S2D) (S2D blog posts):

image

Proof-of-Concept (PoC) Storage Spaces Direct (S2D) Hyper-Converged or S2D SOFS cluster solution

The above Storage Spaces Direct PoC, based on the Intel Server System R2224WTTYSR, provides us with the necessary experience to deliver the hundreds of thousands of real every-day IOPS that our client solutions require. We can tailor our solution for a graphics firm to allow for multiple 10GbE data reads to their user’s systems or an engineering and architectural firm that requires high performance storage for their rendering farms.

Another Storage Spaces Direct PoC we are working on is the Kepler-47 (TechNet Blog Post):

image

Proof-of-Concept Storage Spaces Direct 2-Node for under $8K with storage!

Our goal for Kepler-47 is to deploy this solution into all clients that were normally deploying a single or dual Hyper-V server setup with Hyper-V Replica. Our recipe includes Intel 3700, 3600, and 3500 series SSDs, SuperMicro Mini-ITX Intel Xeon Processor E5-1200v5 series board, the 8-bay chassis, and Mellanox ConnectX-3 for direct connected RDMA East-West traffic. Cost for the 2-node cluster is about the same as one bigger Intel Server System that would run a client’s entire virtualization stack.

image

2016: Deployed 2 Node SOFS via Quanta JB4602 with ~400TB Storage Spaces Parity

In the late summer of 2016 we deployed the above SOFS cluster with the ultimate aim of adding three more JBODs for over 1.6PB (Petabytes) of very cost efficient Storage Spaces Parity storage for our client’s video and image files archive. The solution utilizes four 10GbE paths per node and SMB Multichannel to provide robust access to the files on the cluster. Six HGST SAS SSDs provide the needed high-speed cache for writes to the cluster.

Our smallest cluster client is a 15 seat accounting firm with a 2 node clustered Storage Spaces and Hyper-V cluster (our blog post). Some of our largest clients are SME hosting companies with hundreds of tenants and VMs running on SOFS storage and Hyper-V compute clusters.

We can deploy highly available solutions for $6K in hardware and up rendering standalone Hyper-V or VMware solutions moot!

Server and Storage Hardware

We primarily utilize Intel Server Systems and Storage as Intel’s support is second to none and our solution price points become way more than competitive to equivalent Tier 1 solutions. When required, we utilize Dell Server Systems and Storage for solutions that require a 4-hour on-site warranty over 3 to 5 years or more.

Our primary go-to for disaggregated SOFS cluster storage (Storage Nodes + Direct Attached Storage JBOD(s)) are Quanta QCT JBODs and DataON Storage JBODs. We’ve had great success with both company’s storage products.

For drives we deploy Intel NVMe PCIe and 2.5”, Intel SATA SSDs, and HGST SAS SSDs. We advise being very aware of each JBOD vendor’s Hardware Compatibility List (HCL) before jumping on just any SAS SSD listed in the Windows Server Catalog Storage Spaces Approved list (Microsoft Windows Server Catalog Site for Storage Spaces).

Important: Utilizing just any vendor’s drive in a Storage Spaces or Storage Spaces Direct setting can be a _costly_ error! One needs to do a lot of homework before deploying any solution into production. BTDT (Been There Done That)

The spinning media we use depends on the hyper-converged or storage solution we are deploying and the results of our thorough testing.

Network Fabrics

In a Storage Spaces Direct (S2D) setting our East-West (node to node) fabric is 10Gb to 100Gb Mellanox with RDMA via RoCEv1 and RoCEv2 (RDMA over Converged Ethernet) (our blog post) depending on the Mellanox NIC ConnectX version. We also turn to RoCE for North-South (Compute to Storage) for our disaggregated cluster solutions.

For 10GbE starter solutions for both storage and compute network the NETGEAR XS716T is the go-to switch. We always deploy either the storage to compute or the workload Hyper-V virtual switch in pairs to provide network resilience. Their switches are very well priced for the entry-level to mid-level solutions we deploy.

Cluster Lab

It’s no secret that we invest a lot in our client solution labs and network shadow solutions (our blog post). It is a point of principle that we make sure our solutions work as promised _before_ we would even consider selling them to our clients.

One does not need to look far for five figure, six figure, seven figure, or more solution failures. Recent catastrophic failures at the Australian Tax Office (Bing search) or 123-reg (Bing search) come to mind. It’s not difficult to find stories of very expensive solutions failing to deliver on their big promises with a big price tag.

The onus is on us to make sure we can under promise and over deliver on every solution!

Our Solutions

We can deliver a wide variety of solutions with the following being a partial list.

  • Storage Spaces Direct (S2D)
    • 2 nodes to 16 Nodes
    • Hyper-Converged running both compute and storage
    • SOFS mode to provide high IOPS storage
    • Hardware agnostic solution sets
    • Host UPDs in Azure, on VMware, or our solutions
  • Scale-Out File Server Clusters (2 nodes to 5 1 JBOD or more) 
    • IOPS tuned for intended workload performance
    • Large Volume backup and archival storage
    • Multiple Enclosure Resilience for additional redundancy
  • Hyper-V Compute Clusters (2 nodes to 64)
    • Tuned to workload type
    • Tuned for workload density
  • Clustered Storage Spaces (2 nodes + 1 JBOD)
    • Our entry-level go-to for small to medium business
    • Kepler-47 fits in this space too
  • RDMA RoCE via Mellanox Ethernet Fabrics for Storage <—> Compute
    • We can deploy 10Gb to 100Gb of RDMA fabric
    • High-Performance storage to compute
    • Hyper-converged East-West fabrics
  • Shadow Lab for production environments
    • Test those patches or application updates on a shadow lab
  • Learning Lab
    • Our lab solutions are very inexpensive
    • Four node S2D cluster that fits into a carry-on
    • Can include an hour or more for direct one-on-one or small-group learning
      • Save _lots_ of time sifting through all the chaff to build that first cluster
  • Standalone Hyper-V Servers
    • We can tailor and deliver standalone Hyper-V servers
    • Hyper-V Replica setups to provide some resilience
    • Ready to greenfield deploy, migrate from existing, or side-by-side migrate

Our solutions arrive at our client’s door ready to deploy production or lab workloads. Just ask us!

Or, if you need help with an existing setup we’re here. Please feel free to reach out.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book

Tuesday, 10 January 2017

Server 2016 January 10 Update: KB3213986 – Cluster Service May Not Start Automatically Post Reboot

The January 10, 2017 update package (KB3213986) has a _huge_ caveat for those updating clusters especially with Cluster Aware Updating:

Known issues in this update:

Symptom
The Cluster Service may not start automatically on the first reboot after applying the update.

Workaround
Workaround is to either start the Cluster Service with the Start-ClusterNode PowerShell cmdlet or to reboot the node.

For those managing large cluster deploys this situation definitely leads to a need to evaluate the update procedure for this particular update.

Please keep this in mind when scheduling this particular update and have update resources set up to mitigate the problem.

Note that as of this writing, the cluster service stall on reboot is a one-time deal as far as we know. Meaning, once the update has been completed and the node has successfully joined the cluster there should be no further issues.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Wednesday, 7 December 2016

AutoDiscover “Broken” in Outlook 2016

We have a client that has their services spread across a number of different cloud systems.

Recently, users had started “losing” their connection to their hosted Exchange mailbox with Outlook coughing up some really strange errors that were not very helpful.

We ran the gamut trying to figure things out since any calls to the hosted Exchange provider and eventually the company hosting their web site always came back with “There’s a problem with Outlook”.

Indeed.

What we’ve managed to figure out is that Outlook 2016 will _always_ run an AutoDiscover check even if we’re manually setting up the mailbox for _any_ Exchange ActiveSync (EAS) connection. It must be some sort of new “security” feature in Outlook 2016.

What does that mean?

It means that when something changes unbeknownst to us things break. :(

In this case the AutoDiscover setup in Outlook for EAS connections and the web host changing something on their end as things were working for a _long_ time before the recent problems. Or, a recent update to Outlook 2016 changed things on the AutoDiscover side that revealed what was happening on the www hosting side.

Okay, back to the problem at hand. This is the prompt we would get when setting up a new mailbox, or eventually all users started getting who already had mailbox connections:

image

Internet E-mail Name@Domain.com

Enter your user name and password for the following server:

Server: gator3146.hostgator.com

Well, our mailboxes are on a third party and not HostGator. So, on to chatting and eventually phoning them after opening a ticket with the Exchange host and hearing back that the problem was elsewhere.

Unfortunately, HostGator was not very helpful via chat or phone when we initially reached out. Outlook was always the problem they claimed.

So, we set up a test mailbox on the hosted Exchange platform and went to our handy Microsoft tool: Microsoft Remote Connectivity Analyzer.

We selected the Outlook Autodiscover option and ran through the steps setting up the mailbox information, then the CAPTCHA a few times ;-), and received the following results:

image

We now had concrete evidence that HostGator was not honouring the AutoDiscover.domain.com DNS setup we had for this domain which was not on their system.

A question was sent out to a fellow MVP on Exchange and their reply back was “HostGator had a URLReWrite rule in place for IIS/Apache that was grabbing the AutoDiscover polls from Outlook and sending them to their own servers.”

During that time we created the /AutoDiscover folder and put a test file in it. The problem still happened.

Okay, back on the phone with HostGator support. The first call had two “escalations” associated with it unfortunately with no results. A second call was made after seeing the MVP response with a specific request to HostGator: Delete the URLReWrite rule that was set up on this client’s site within the last month.

They could not do it. Nothing. Nada. Zippo. :(

So, for now our workaround was to move the DNS A record for @ (Domain.com) to the same IP as the hosted Exchange service’s AutoDiscover IP to at least get Outlook to fail on the initial domain poll.

Moral of the story?

We’re moving all of our client’s web properties off HostGator to a hosting company that will honour the setup we implement and use the Microsoft Remote Connectivity Analyzer to test things out thoroughly.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 22 November 2016

Something to be Thankful For

There are many things to be grateful for. For us, it's our family, friends, business, and so much more.

This last weekend we were reminded in a not so subtle way just how fragile life can be.

image

The other driver, two of my kids, and myself were all very fortunate to walk away with no bones broken or blood spilled. We had a big grug later in the day when we are all finally back together at home.

Dealing with the soreness and migraines since the accident are a small price to pay for the fact that we are all okay.

And fortunately, the other driver took full responsibility for the critical error in judgement that caused the accident so no insurance scrambles will be dealt with.

We are truly thankful to be alive today.

Have a great Thanksgiving to our US neighbours. And for everyone, give those special folks in life a hug I sure have been! ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

Tuesday, 15 November 2016

What’s in a Lab? Profit!

Our previous post on Server Hardware: The Data Bus is Playing Catch-Up has had a lot of traction.

Our tweets on I.T. companies not having a lab for their solutions sales engineers and technicians has had a lot of traction.

So, let’s move forward with a rather blunt opinion piece shall we? ;)

What client wants to drop $25K on an 800bhp blown 454CID engine then shovel it in to that Vega/Monza only to find the car twisted into a pretzel on the first run and very possibly the driver with serious injuries or worse?

image

Image credit

Seriously, why wouldn’t the same question be asked by a prospect or client that is about to drop $95K or more on a Storage Spaces Direct (S2D) cluster that the I.T. provider has _never_ worked with? Does the client or prospect even think of asking that question? Are there any references with that solution in production? If the answer is “No” then get the chicken out of that house!

In the automotive industry folks ask those questions especially when they have some serious coin tied up in the project … at least we believe they would based on previous experience.

Note that there are a plethora of videos on YouTube and elsewhere showing the results of so-called “tuners” blowing the bottom end out of an already expensive engine. :P

In all seriousness though, how can an I.T. company sell a solution to a client that they’ve never worked with, put together, tested, or even _seen_ before?

It really surprised me to be chatting with a technical architect that works for a large I.T. provider when they told me their company doesn’t believe there is any value in providing a lab for them.

S2D Lab Setup

A company that keeps a lab, refreshes it every so often, stands to gain so much more than folks that count the beans may see.

For S2D, the following is a good place, and inexpensive, to start:

  • Typical 4-node S2D lab based on Intel Server Systems
    • R2224WTTYSR Servers: $15K each
    • Storage
      • Intel 750 Series NVMe $1K/Node
      • Intel 3700 Series SATA $2K/Node
      • Seagate/HGST Spindles $3K/Node
    • Mellanox RDMA Networking: $18K (MSX1012X + 10GbE CX-3 Adapters)
    • NETGEAR 10GbE Networking: $4K (XS716T + X540-T2 or X550-T2)
    • Cost: ~$75K to $85K

The setup should look something like this:

image

S2D Lab (Front)

image

S2D Lab (Rear)

Note that we have two extra nodes for a Hyper-V cluster setup to work with S2D as a SOFS only solution.

Okay, so the bean counters are saying, “what do we get for our $100K hmmm?”

Point 1: We’ve Done It

The above racked systems images go into any S2D Proposal with an explanation that we’ve been building these hyper-converged clusters since Windows Server 2016 was in its early technical preview days. The prospect that sees the section outlining our efforts to fine tune our solutions on our own dime places our competitors at a huge disadvantage.

Point 2: References

With our digging in and testing from the outset we would be bidding on deals with these solutions. As a result, we are one of the few with go-to-market ready solutions and will have deployed them before most others out there even know what S2D is!

Point 3: Killer and Flexible Performance

Most solutions we would be bidding against are traditional SAN style configurations. Our hyper-converged S2D platform provides a huge step up over these solutions in so many ways:

  1. IOPS: NVMe utilized at the cache layer for real IOPS gains over traditional SAN either via Fibre Channel or especially iSCSI.
  2. Throughput: Our storage can be set up to run huge amounts of data through the pipe if required.
  3. Scalability: We can start off small and scale out up to 16 nodes per cluster.
    • 2-8 nodes @ 10GbE RDMA via Mellanox and RoCEv2
    • 8-16 nodes @ 40GbE RDMA via Mellanox and RoCEv2
      • Or, 100GbE RDMA via Mellanox and RoCEv2

This begs the question: How does one know how one’s solution is going to perform if one has never deployed it before?

Oh, we know: “I’ve read it in Server’s Reports”, says the lead sales engineer. ;)

Point 4: Point of Principle

It has been mentioned here before: We would never,_ever_, deploy a solution that we’ve not worked with directly.

Why?

For one, because we want to make sure our solution would fulfil the promises we’ve made around it. We don’t want to be called to come and pick up our high availability solution because it does not do what it was supposed to do. We’ve heard of that happening for some rather expensive solutions from other vendors.

Point 5: Reputation

Our prospects can see that we have a history, and a rather long one at that, of digging in quite deep both in our own pockets but also of our own time to develop our solution sets. That also tells them that we are passionate about the solutions we propose.

We _are_ Server’s Reports so we don’t need to rely on any third party for a frame of reference! ;)

Conclusion

Finally, an I.T. company that invests in their crew both in lab kit, time, training, and mentorship will find their crew quite passionate about the solutions they are selling and working with. That translates into sales but also happy clients that can see for themselves that they are getting a great value for their I.T. dollars.

I.T. Services Companies get and maintain a lab! It is worth it!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service