Thursday 9 November 2017

Intel Server System R2224WFTZS Integration & Server Building Thoughts

We have a brand new Intel Server System R2224WFTZS that is the foundation for a mid to high performance virtualization platform.

image

Intel Server System R2224WFTZS 2U

Below it sits one of our older lab Intel Server System SR2625URLX 2U. Note the difference in the drive caddy.

That change is welcome as the caddy no longer requires a screwdriver to set the drive in place:

image

Intel 2.5" Tooless Drive Caddy

What that means is the time required to get 24 drives installed in the caddies went from half an hour or more to five or ten minutes. That, in our opinion, is a great leap ahead!

The processors for this setup are Intel Xeon Gold 6134s with 8 cores running at 3.2GHz with a peak of 3.7GHz. We chose the Gold 6134 as a starting place as most of the other CPUs have more than eight cores thus pushing up the cost of licensing Microsoft Windows Server Standard or Datacenter.

image

Intel Xeon Gold 6134, Socket, Heatsink, and Canadian Loonie $1 Coin

The new processors are huge!

The scale difference between the E3-1200 series, E5-2600 series is orders of magnitude larger. The jump in size reminds me of the Pentium Pro's girth next to the lesser desktop/server processors of the day.

image

Intel Xeon Processor E3-1270 sits on the Intel Xeon Gold 6134

The server is nearly complete.

image

Intel Server System R2224WFTZS Build Complete

Bill of Materials

In this setup the server's Bill of Materials (BoM) is as follows:

  • (2) Intel Xeon Gold 6134
  • 384GB via 12x 32GB Crucial DDR4 LRDIMM
  • Intel Integrated RAID Module RMSP3CD080F with 7 Series Flash Cache Backup
  • Intel 12Gbps RAID Expander Module RES3TV360
  • (2) 150GB Intel DC S3520 M.2 SSDs for OS
  • (5) 1.9TB Intel DC S4600 SATA SSDs for high IOPS tier
  • (19) 1.8TB Seagate 10K SAS for low to mid IOPS tier
  • Second Power Supply, TPM v2, and RMM4 Module

It's important to note that when setting up a RAID controller instead of a Host Bus Adapter (HBA) that does JBOD only we require the flash cache backup module. In this particular unit one needs to order the mounting bracket: AWTAUXBBUBKT

I'm not sure why we missed that, but we've updated our build guides to reflect the need for it going forward.

One other point of order is the rear 2.5" hot swap drive bay kit (A2UREARHSDK2) does not come installed from the factory in the R2224WFTZS as it did in the R2224WTTYS. I'm still not sold on M.2 for the host operating system as they are not hot swap capable. That means, if one dies we have to down a node in order to change it. With the rear hot swap bay we can do just that, swap out the 2.5" SATA SSD that's being used for the host OS.

For the second set of two 10GbE ports we used an Intel X540-T2 PCIe add-in card as the I/O modules are not in the distribution channel as of this writing.

NOTE: One requires a T30 hex screwdriver for the heatsinks! After installing the processor please make sure to start all four nuts prior to tightening. As a suggestion, from there snug each one up gradually starting with the two middle nuts then the outer nuts similar to the process for installing a head on an engine block. This process provides an even amount of pressure from the middle of the heatsink outwards.

Firmware Notes

Finally, make sure to update the firmware on all components before installing an operating system. There are some key fixes in the motherboard firmware updates as of this writing (BIOS 00.01.0009 ReadMe). Please make sure to read through to verify any caveats associated with the update process or the updates themselves.

Next up on our build process will be to update all firmware in the system, install the host operating system and drivers, and finally run a burn-in process. From there, we'll run some tests to get a feel for the IOPS and throughput we can expect from the two RAID arrays.

Why Build Servers?

That's got to be the burning question on some minds. Why?

The long and the short of it is because we've been doing so for so many years it's a hard habit to kick. ;)

Actually, the reality is much more mundane. We continue to be actively involved in building out our own server solutions for a number of reasons:

  • We can fine tune our solutions to specific customer needs
    • Need more IOPS we can do that
    • Need more throughput we can do that
    • Need a blend of the two as is the case here, then we can do that too.
  • Direct contact with firmware issues, interoperability, and stability
    • Making the various firmware bits play nice together can be a challenge
  • Driver issues, interoperability, and stability
    • Drivers can be quite finicky about what's in the box with them
  • Hardware interoperability
    • Our parts bin is chalk full of parts that refused to work with one another
    • On the other hand our solution sets are known good configurations
  • Cost
    • Our server systems are a fraction of the cost of Tier 1
  • Overall system configuration
    • As Designed Stability out of the box
  • He said She said
    • Since we test our systems extensively prior to deploying we know them well
    • Software Vendors that point the finger have no leg to stand on as we have plenty of charts and graphs
    • Performance issues are easier to pinpoint in software vendor's products
    • We remove the guesswork around an already configured Tier 1 box

Business Case

The business case is fairly simple: There are _a lot_ of folks out there that do not want to cloud their business. We help customers with a highly available solution set and our business cloud to give them all of the cloud goodness but keep their data on-premises.

We also help I.T. Professional Shops who may not have the skill-set on board that have customers with a need for High Availability and a cloud like experience but want the solution deployed on-premises.

For those customers that do want to cloud their business we have a solution set for the Small to Medium I.T. Shops that want to provide multi-tenant solutions in their own data centres. We provide the solution and backend support at a very reasonable cost while they spend their time selling their cloud.

All in all, we've found ourselves a number of different great little niches for our highly available solutions (clusters) over the last few years.

Thanks for reading! :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Web Site
Our Cloud Service
Twitter: @MPECSInc

Friday 3 November 2017

A Little Plug for Mellanox and RoCE RDMA

RoCE (RDMA over Converged Ethernet) via Mellanox NICs and switches is our primary fabric choice for Storage Spaces Direct (S2D) and Scale-Out File Server (SOFS) to Hyper-V compute cluster fabric.

With the Mellanox MSX1012X 10GbE switch we can deploy a pair of them along with a pair of ConnectX-4 Lx dual port NICs per node for about the same cost as a pair of NETGEAR XS716T 10GbE switches and a pair of Intel X540/X550-T2 10GbE RJ45 based NICs per node.

We have a great business relationship with Mellanox. They are great folks to work with and their product support is second to none.

I was honoured to be asked to use a portion of my presentation for MVPDays to create the following video that is resident on Mellanox's YouTube channel.

Hopefully the video comes out okay as embedding it was a bit of a chore.

Thanks for reading and have a great weekend!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service
Twitter: @MPECSInc

Wednesday 1 November 2017

Error Fix: Event 7034 Service Control Manager - Server, BITS, Task Scheduler, Windows Management Instrumentation, Shell Hardware Detection Crashes

This has just recently started to pop up on networks we manage.

All of the following are Event ID 7034 Service Control Manager service terminated messages:

  • The Windows Update service terminated unexpectedly. It has done this 3 time(s).
  • The Windows Management Instrumentation service terminated unexpectedly. It has done this 3 time(s).
  • The Shell Hardware Detection service terminated unexpectedly. It has done this 3 time(s).
  • The Remote Desktop Configuration service terminated unexpectedly. It has done this 3 time(s).
  • The Task Scheduler service terminated unexpectedly. It has done this 3 time(s).
  • The User Profile Service service terminated unexpectedly. It has done this 3 time(s).
  • The Server service terminated unexpectedly. It has done this 3 time(s).
  • The IP Helper service terminated unexpectedly. It has done this 2 time(s).
  • The Device Setup Manager service terminated unexpectedly. It has done this 3 time(s).
  • The Certificate Propagation service terminated unexpectedly. It has done this 2 time(s).
  • The Background Intelligent Transfer Service service terminated unexpectedly. It has done this 3 time(s).
  • The System Event Notification Service service terminated unexpectedly. It has done this 2 time(s).

It turns out that all of the above are tied into SVCHost.exe and guess what:

Log Name: Application
Source: Application Error
Date: 10/23/2017 5:09:57 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
Computer: ABC-Server.domain.com
Description:
Faulting application name: svchost.exe_DsmSvc, version: 6.3.9600.16384, time stamp: 0x5215dfe3
Faulting module name: DeviceDriverRetrievalClient.dll, version: 6.3.9600.16384, time stamp: 0x5215ece7
Exception code: 0xc0000005
Fault offset: 0x00000000000044d2
Faulting process id: 0x138
Faulting application start time: 0x01d34c5c3f589fe7
Faulting application path: C:\Windows\system32\svchost.exe
Faulting module path: C:\Windows\System32\DeviceDriverRetrievalClient.dll

A contractor of ours that we deployed a greenfield AD and cluster for was the one who figured it out. WSUS and the Group Policy settings were deployed this last weekend with everything in our Cloud Stack running smoothly until then.

The weird thing is, we have had these settings in place for years now without any issues.

The following are the settings changed at both sites:

System/Device Installation
Specify search order for device driver source locations: Not Configured
2014-02-11: Enabled by Philip Elder.
2017-11-01: Not Configured by Philip Elder.
Specify the search server for device driver updates: Not Configured
2014-02-11: Enabled by Philip Elder.
2017-11-01: Not Configured by Philip Elder.

System/Driver Installation
Turn off Windows Update device driver search prompt: Not Configured
2017-10-28: Disabled by Philip Elder.
2017-11-1: Returned to Not Configured by Philip Elder

System/Internet Communication Management/Internet Communication settings
Turn off Windows Update device driver searching: Not Configured
2014-02-11: Disabled by Philip Elder.
2017-11-01: Not Configured by Philip Elder.

It is important to note that when working with Group Policy settings a comment should be made in each setting if at all possible. Then, when it comes to troubleshooting an errant behaviour that turns out to be Group Policy related we are better able to figure out where the setting is and when it was set. In some cases, a short description of the "Why" the setting was made helps.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service
Twitter: @MPECSInc