Wednesday, 12 December 2018

Intel Technology Provider for 2019

We just received word of our renewal for the Intel Technology Provider program:


We've been system builders since the company began in 2003 with my building systems for more than a decade before that!

One of the comments that gets made on a somewhat frequent basis is something along the lines of being a "Dinosaur". ;)

Or, this question gets asked quite a lot, "Why?"

There are many reasons for the "Why". Some that come off the top are:

  • We design solutions that meet very specific performance needs such as 150K IOPS, 500K IOPS, 1M IOPS and more
  • Our solutions get tested and thrashed before they ever get sold
    • We have a parts bin with at least five figures worth of broken vendor's promises
  • We have a solid understanding of component and firmware interactions
  • Our systems come with guaranteed longevity and performance
    • How many folks can say that when "building" a solution in a Vendor's "Solution Tool"?
  • We avoid the finger pointing that can happen when things don't live up to muster

The following is one of our lab builds. A two node Storage Spaces Direct (S2D) cluster utilizing 24 Intel SSD DC-4600 or D3-4610 SATA series SSDs flat meaning no cache layer. The upper graphs are built in Grafana while the bottom left is Performance Monitor watching the RoCE (RDMA over Converged Ethernet via Mellanox) and the bottom right is the VMFleet WatchCluster PowerShell.


We just augmented the two node setup with 48 more Intel SSD D3-4610 SATA SSDs for the other two nodes and are waiting on a set of Intel SSD 750 series NVMe PCIe AiCs (Add-in-Card) to bring our 750 count up to 3 per node for NVMe cache.

Why the Intel SSD 750 Series? They have Power Loss Protection built-in. Storage Spaces Direct will not allow any cache devices hold any data in the storage's local cache if it is volatile. What becomes readily discoverable is that writing straight through to NAND is a very _slow_ process relative to having that cache power protected!

We're looking to hit 1M IOPS flat SSD and well over that when the NVMe cache setup gets introduced. There's a possibility that we'll be seeing some Intel Optane P4800X PCIe AiCs in the somewhat near future as well. We're geared-up for a 2M+ run there. :D

Here's another test series we were running to saturate the node's CPUs and storage to see what kind of numbers we would get at the guest level:


Again, the graphs in the above shot are Grafana based.

The snip below is our little two node S2D cluster (E3-1270v6, 64GB ECC, Mellanox 10GbE RoCE, 2x Intel DC-4600 SATA SSD Cache, 6x 6TB HGST SATA) pushing 250K IOPS:


We're quite proud of our various accomplishments over the years with our high availability solutions running across North America and elsewhere in the world.

We've not once had a callback asking us to go and pick-up our gear and refund the payment because it did not meet the needs of the customer as promised.

Contrary to the "All in the Cloud" crowd there is indeed a niche for those of us that provide highly available solution sets to on-premises clients. Those solutions allow them to have the uptime they need without the extra costs of running all-in the cloud or hybrid with peak resources in the cloud. Plus, they know where their data is.

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book
Our Web Site
Our Cloud Service

Tuesday, 11 December 2018

OS Guide: Slipstream Updates Using DISM and OSCDImg *Updated

We have found out that we need to have the May Servicing Stack Update (SSU) KB4132216 _and_ the latest SSU which is currently KB4465659 in the Updates_WinServ folder we drop the Cumulative Update into for the Windows Server 2016 slipstream run.


Note that the current version of the script points to Server 2019. Please use that as a base to tweak and create a set of folders for Windows Server 2016 and Windows 10 updates.

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book !
Our Web Site
Our Cloud Service

Thursday, 6 December 2018

Error Fix: Trust Relationship is Broken

Here's a quick post on fixing a broken trust situation when the local administrator username and password is a known commodity.

On Windows 7:

  1. Windows Explorer
  2. Right click My Computer/This PC --> Properties
  3. Change settings for Computer Name
  4. Change button
  5. Domain: Setting Now: DOMAIN.LOCAL
    1. Change to DOMAIN (delete .Local)
    2. Credential
  6. Reboot

That process will fix things for Windows 7 unless PowerShell is up to date then for all others including it:

  1. Log on with local admin user
  2. Reset-ComputerMachinePassword -Credential DOMAIN\DomainAdmin
  3. Log off
  4. Log on with domain user account

That's it.

If you know any other methods, especially for situations where the local admin username and password is an unknown or all local admin accounts are disabled feel free to comment or ping!

Thanks for reading. :)

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book
Our Web Site
Our Cloud Service

Monday, 3 December 2018

A word of caution: Verify! Everything!

We get to work with a whole host of clients and their industries but as a contractor we also get to work with a wide variety of IT Pros and IT Companies.

Many times we get involved in a situation that's an outright pickle.

Something has gone sideways and the caller is looking for some guidance, some direction, and a bit of handholding because they are at a loss.

Vendor Blame Game

Some of those times the caller is in the middle of a tongue wagging session between a set of vendors blaming the other for the pickle.

We were in that situation back in the day when a Symantec BackupExec (BUE) solution failed to restore. The client site had two HP DAT tape libraries with BUE firing "All Good" reports.

We found out those reports were bad when the client's main file server went blotto.

We were in between the storage vendor and their products, Symantec, and HP. It was not a pretty scene at all. In the end, it was determined _by us_ that BUE was the source of the problem because it did not do any kind of verify on the backups being written to tape despite the setting being there to do so.

We were fortunate that we had multiple redundant systems in place and managed to get most of the data back except one partner's weeks' worth of work. We had to build out a new domain though.

So, why the blog post?

Because, it's still happening today.

Verify, Verify, and Verify Again

We _highly suggest_ verifying that all backup products and backup services are doing what they say they are doing.

If the service provider charges for a test failover then do it anyway. Charge the fee back to the client because once the process has run successful or not things are in a better place either way.

Never, _ever_ walk into a disaster recovery situation without ever having tested the products that are supposed to save the client's business. Period.

Yeah, there are times where something may happen before that planned failover. That's not what we're talking about here.

What we are after is testing to make sure that the vendor's claims are indeed true and that the solution set is indeed working as planned.

The last place we need to find out that our client's backups are _not_ working is when their servers, virtual machines, cloud services vendors have gone blotto.

Out of the Cloud Backup

We always make sure we have a way to back up any cloud vendor's services to our client's site. It just makes sense.

Our trust is a very fickle thing.

When it comes to our client's data we don't give our full trust to any vendor or solution set.

We _always_ test the backup and recovery processes so that we're not blindsided by things not going as planned or any "hidden fees" for accessing the client's data in a disaster recovery situation.

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book !
Our Web Site
Our Cloud Service