Showing posts with label S2D. Show all posts
Showing posts with label S2D. Show all posts

Friday, 11 January 2019

Some Thoughts on the S2D Cache and the Upcoming Intel Optane DC Persistent Memory

Intel has a very thorough article that explains what happens when the workload data volume on a Storage Spaces Direct (S2D) Hyper-Converged Infrastructure (HCI) cluster starts to "spill over" to the capacity drives in a NVMe/SSD Cache with HDD Capacity for storage.

Essentially, any workload data that needs to be shuffled over to the hard disk layer will suffer a performance hit and suffer it big time.

In a setup where we would have either NVMe PCIe Add-in Cards (AiCs) or U.2 2.5" drives for cache and SATA SSDs for capacity the performance hit would not be as drastic but it would still be felt depending on workload IOPS demands.

So, what do we do to make sure we don't shortchange ourselves on the cache?

We baseline our intended workloads using Performance Monitor (PerfMon).

Here is a previous post that has an outline of what we do along with links to quite a few other posts we've done on the topic: Hyper-V Virtualization 101: Hardware and Performance

We always try to have the right amount of cache in place for the workloads of today but also with the workloads of tomorrow across the solution's lifetime.

S2D Cache Tip

TIP: When looking to set up a S2D cluster we suggest running with a higher count smaller volume cache drive set versus just two larger capacity drives.

Why?

For one, we get a lot more bandwidth/performance out of three or four cache devices versus two.

Secondly, in a 24 drive 2U chassis if we start off with four cache devices and lose one we still maintain a decent ratio of cache to capacity (1:6 with four versus 1:8 with three).

Here are some starting points based on a 2U S2D node setup we would look at putting into production.

  • Example 1 - NVMe Cache and HDD Capacity
    • 4x 400GB NVMe PCIe AiC
    • 12x xTB HDD (some 2U platforms can do 16 3.5" drives)
  • Example 2 - SATA SSD Cache and Capacity
    • 4x 960GB Read/Write Endurance SATA SSD (Intel SSD D3-4610 as of this writing)
    • 20x 960GB Light Endurance SATA SSD (Intel SSD D3-4510 as of this writing)
  • Example 3 - Intel Optane AiC Cache and SATA SSD Capacity
    • 4x 375GB Intel Optane P4800X AiC
    • 24x 960GB Light Endurance SATA SSD (Intel SSD D3-4510 as of this writing)

One thing to keep in mind when it comes to a 2U server with 12 front facing 3.5" drives along with four or more internally mounted 3.5" drives is their heat and available PCIe slots. Plus, the additional drives could also place a constraint on the processors that are able to be installed also due to thermal restrictions.

Intel Optane DC Persistent Memory

We are gearing up for a lab refresh when Intel releases the "R" code Intel Server Systems R2xxxWF series platforms hopefully sometime this year.

That's the platform Microsoft set an IOPS record with set up with S2D and Intel Optane DC persistent memory:

We have yet to see to see any type of compatibility matrix as far as the how/what/where Optane DC can be set up but one should be happening soon!

It should be noted that they will probably be frightfully expensive with the value seen in online transaction setups where every microsecond counts.

TIP: Excellent NVMe PCIe AiC for lab setups that are Power Loss Protected: Intel SSD 750 Series

image

Intel SSD 750 Series Power Loss Protection: YES

These SSDs can be found on most auction sites with some being new and most being used. Always ask for an Intel SSD Toolbox snip of the drive's wear indicators to make sure there is enough life left in the unit for the thrashing it would get in a S2D lab! :D

Acronym Refresher

Yeah, gotta love 'em! Being dyslexic has its challenges with them too. ;)

  • IOPS: Inputs Outputs per Second
  • AiC: Add-in Card
  • PCIe: Peripheral Component Interconnect Express
  • NVMe: Non-Volatile Memory Express
  • SSD: Solid-State Drive
  • HDD: Hard Disk Drive
  • SATA: Serial ATA
  • Intel DC: Data Centre (US: Center)

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Thursday, 10 January 2019

Server Storage: Never Use Solid-State Drives without Power Loss Protection (PLP)

Here's an article from a little while back with a very good explanation of why one should not use consumer grade SSDs anywhere near a server:

While the article points specifically to Storage Spaces Direct (S2D) it is also applicable to any server setup.

The impetus behind this post is pretty straight forward via a forum we participate in:

  • IT Tech: I had a power loss on my S2D cluster and now one of my virtual disks is offline
  • IT Tech: That CSV hosted my lab VMs
  • Helper 1: Okay, run the following recovery steps that help ReFS get things back together
  • Us: What is the storage setup in the cluster nodes?
  • IT Tech: A mix of NVMe, SSD, and HDD
  • Us: Any consumer grade storage?
  • IT Tech: Yeah, the SSDs where the offline Cluster Storage Volume (CSV) is
  • Us: Mentions above article
  • IT Tech: That's not my problem
  • Helper 1: What were the results of the above?
  • IT Tech: It did not work :(
  • IT Tech: It's ReFS's fault! It's not ready for production!

The reality of the situation was that there was live data sitting in the volatile cache DRAM on those consumer grade SSDs that got lost when the power went out. :(

We're sure that most of us know what happens when even one bit gets flipped. Error Correction on memory is mandatory for servers for this very reason.

To lose an entire cache worth across multiple drives is pretty much certain death for whatever sat on top of them.

Time to break-out the backups and restore.

And, replace those consumer grade SSDs with Enterprise Class SSDs that have PLP!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Wednesday, 12 December 2018

Intel Technology Provider for 2019

We just received word of our renewal for the Intel Technology Provider program:

image

We've been system builders since the company began in 2003 with my building systems for more than a decade before that!

One of the comments that gets made on a somewhat frequent basis is something along the lines of being a "Dinosaur". ;)

Or, this question gets asked quite a lot, "Why?"

There are many reasons for the "Why". Some that come off the top are:

  • We design solutions that meet very specific performance needs such as 150K IOPS, 500K IOPS, 1M IOPS and more
  • Our solutions get tested and thrashed before they ever get sold
    • We have a parts bin with at least five figures worth of broken vendor's promises
  • We have a solid understanding of component and firmware interactions
  • Our systems come with guaranteed longevity and performance
    • How many folks can say that when "building" a solution in a Vendor's "Solution Tool"?
  • We avoid the finger pointing that can happen when things don't live up to muster

The following is one of our lab builds. A two node Storage Spaces Direct (S2D) cluster utilizing 24 Intel SSD DC-4600 or D3-4610 SATA series SSDs flat meaning no cache layer. The upper graphs are built in Grafana while the bottom left is Performance Monitor watching the RoCE (RDMA over Converged Ethernet via Mellanox) and the bottom right is the VMFleet WatchCluster PowerShell.

image

We just augmented the two node setup with 48 more Intel SSD D3-4610 SATA SSDs for the other two nodes and are waiting on a set of Intel SSD 750 series NVMe PCIe AiCs (Add-in-Card) to bring our 750 count up to 3 per node for NVMe cache.

Why the Intel SSD 750 Series? They have Power Loss Protection built-in. Storage Spaces Direct will not allow any cache devices hold any data in the storage's local cache if it is volatile. What becomes readily discoverable is that writing straight through to NAND is a very _slow_ process relative to having that cache power protected!

We're looking to hit 1M IOPS flat SSD and well over that when the NVMe cache setup gets introduced. There's a possibility that we'll be seeing some Intel Optane P4800X PCIe AiCs in the somewhat near future as well. We're geared-up for a 2M+ run there. :D

Here's another test series we were running to saturate the node's CPUs and storage to see what kind of numbers we would get at the guest level:

image

Again, the graphs in the above shot are Grafana based.

The snip below is our little two node S2D cluster (E3-1270v6, 64GB ECC, Mellanox 10GbE RoCE, 2x Intel DC-4600 SATA SSD Cache, 6x 6TB HGST SATA) pushing 250K IOPS:

image

We're quite proud of our various accomplishments over the years with our high availability solutions running across North America and elsewhere in the world.

We've not once had a callback asking us to go and pick-up our gear and refund the payment because it did not meet the needs of the customer as promised.

Contrary to the "All in the Cloud" crowd there is indeed a niche for those of us that provide highly available solution sets to on-premises clients. Those solutions allow them to have the uptime they need without the extra costs of running all-in the cloud or hybrid with peak resources in the cloud. Plus, they know where their data is.

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.commodityclusters.com
Our Web Site
Our Cloud Service

Tuesday, 13 November 2018

New PowerShell Guides and DISM Slipstream Process Updated

We've added two new PowerShell Guides:

We've also updated the page with some tweaks to using DISM to update images in the Install.WIM in Windows Server. The process can also be used to slipstream both Servicing Stack Updates (SSUs) and Cumulative Updates (CUs) for both Windows Server and Windows Desktop.

Thanks for reading! :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Wednesday, 15 August 2018

PowerShell Paradise: Installing and Configuring Visual Studio Code (VS Code) and Git

It was Ben Thomas that gave me the prodding to look into Visual Studio Code (VS Code) for PowerShell coding and troubleshooting. I had an error in a PowerShell step that puzzled me. It turned out to be the auto-replaced hyphens that got introduced into that PowerShell step somewhere along the lines since I keep (kept) everything in OneNote.

There are several reasons why coding in any form are difficult for me, suffice it to say it took a few days to get over the, "Oh no, yet something else to learn" initial reaction to that prodding.

With a little downtime late last week, the opportunity presented itself to at least do a cursory search and skim of info on VS Code and PowerShell.

What I saw amazed me so much so that the time to learn became a non-starter.

First, download VS Code but don't install it right away.

Next, download GIT for Windows (there's other versions).

Now, there's a bit of a Catch-22 in this process as Git looks for VS Code and VS Code looks for Git.

Install VS Code and Git

Install order to make things simple:

  1. Install VS Code
    • image
  2. Run VS Code and ignore the Git prompt
    • image
  3. Install Git and choose VS Code
    • image
    • This is where things can get weird. If VS Code does not get started first, the Next button will not light up!
    • If not, leave this window, start VS Code, ignore the prompt, and close it.
    • Hit the Back button and then the Next button again and the Next button on this window should now be lit up.
  4. We chose Use Git from the Windows Command Prompt
    • image
  5. On the next window Use the OpenSSL Library
  6. Checkout Windows-style, commit Unix-style line endings
    • image
  7. Use MinTTY (the default terminal of MSYS2)
    • image
  8. We left the defaults for Configuring extra options
    • Enable file system caching
    • Enable Git Credential Manager

Once Git has been installed the next thing to do is to start VS Code and it should find Git:

image

Initialize the Git Repository

A few more steps and we'll be ready to code a new PowerShell script, or transfer in from whatever method we've been using prior.

  1. Create a new folder to store everything in
    • We're using OneDrive Consumer as a location for ours to make it easily accessible
  2. CTRL+SHFT+E --> Open Folder --> Folder created above
    • VS Code will reload when the folder has been chosen
  3. CTRL+SHFT+G --> Click the Initialize Repository button
    • image
  4. Click the Initialize Repository button with the folder we opened being the default location
  5. Git should be happy
    • image

Now, we're almost there!

VS Code Extension Installation

The last steps are to get the PowerShell Extension installed and tweak the setup to use it.

  1. CTRL+SHFT+X
  2. Type PowerShell in the Marketplace search
  3. Click the little green Install button then after the install the little blue Reload button
    • image
  4. Additional VS Code Extensions we install by default
    1. Better Comments
      • Allows for colour coded # ! PowerShell Comments
    2. Git History
      • Allows us to look at what's happening in Git
    3. VSCode-Icons
      • Custom icons in VS Code

VS Code Quick Navigation

Once done, the following key strokes are the first few needed to get around and then there's one more step:

  • Source Control: Git: CTRL+SHFT+G
  • Extensions: CTRL+SHFT+X
  • Folder/File Explorer: CTRL+SHFT+E
  • User/Workspace Settings: CTRL+,

Create the Workspace

And finally, the last step is to:

  1. File
  2. Save Workspace As
  3. Navigate to the above created folder
  4. Name the Workspace accordingly
  5. Click the Save button

Then, it's Ready, Set, Code! :)

Note that the PowerShell .PS1 files should be saved in the Workspace folder and/or subfolders to work with them.

To start all new files in the PowerShell language by default add the following to User Settings

  1. CTRL+,
  2.     "files.defaultLanguage": "powershell"

One of the beauties of this setup is the ability to look at various versions of the files, much like we can with SharePoint and Office files, to compare the changes made over the history of the PowerShell Code.

Another is the ability to see in glorious colour!

image

Thanks to Ben Thomas' challenge PowerShell is already so much easier to work with!

2018-08-15 EDIT: Oops, missed one important step.

  1. Open Git GUI
  2. Click Open Existing Repository
  3. Navigate to the Workspace folder with the .git hidden folder
  4. Open
  5. Set the User's name and E-mail address for both settings
    • image
  6. Click Save

Git should be happy to Commit after that! :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Friday, 10 August 2018

Intel/LSI/Avago StorCli Error: syntax error, unexpected $end FIX

We're working with an Intel setup and needed to verify the setup on an Intel RAID Controller.

After downloading the command line utilities, since we're in Server Core, we hit this:

C:\Temp\Windows>storcli /cx show

syntax error, unexpected $end

     Storage Command Line Tool  Ver 007.0415.0000.0000 Feb 13, 2018

     (c)Copyright 2018, AVAGO Technologies, All Rights Reserved.


help - lists all the commands with their usage. E.g. storcli help
<command> help - gives details about a particular command. E.g. storcli add help

List of commands:

Commands   Description
-------------------------------------------------------------------
add        Adds/creates a new element to controller like VD,Spare..etc
delete     Deletes an element like VD,Spare
show       Displays information about an element
set        Set a particular value to a property
get        Get a particular value to a property
compare    Compares particular value to a property
start      Start background operation
stop       Stop background operation
pause      Pause background operation
resume     Resume background operation
download   Downloads file to given device
expand     expands size of given drive
insert     inserts new drive for missing
transform  downgrades the controller
/cx        Controller specific commands
/ex        Enclosure specific commands
/sx        Slot/PD specific commands
/vx        Virtual drive specific commands
/dx        Disk group specific commands
/fall      Foreign configuration specific commands
/px        Phy specific commands
/[bbu|cv]  Battery Backup Unit, Cachevault commands
/jbodx      JBOD drive specific commands

Other aliases : cachecade, freespace, sysinfo

Use a combination of commands to filter the output of help further.
E.g. 'storcli cx show help' displays all the show operations on cx.
Use verbose for detailed description E.g. 'storcli add  verbose help'
Use 'page=[x]' as the last option in all the commands to set the page break.
X=lines per page. E.g. 'storcli help page=10'
Use J as the last option to print the command output in JSON format
Command options must be entered in the same order as displayed in the help of
the respective commands.

What the Help does not make clear, and what our stumbling block was, is what exactly we're missing.

It turns out, that the correct command is:

C:\Temp\Windows>storcli /c0 show jbod
CLI Version = 007.0415.0000.0000 Feb 13, 2018
Operating system = Windows Server 2016
Controller = 0
Status = Success
Description = None


Controller Properties :
=====================

----------------
Ctrl_Prop Value
----------------
JBOD      ON
----------------


CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded

The /cx switch needed a number for the controller ID.

A quick search turned up the following:

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.commodityclusters.com
Our Web Site
Our Cloud Service

Thursday, 9 August 2018

PowerShell: Add-Computer Error when Specifying OUPath: The parameter is incorrect FIX

We're in the process of setting up a second 2-node Kepler-64 cluster when we hit this when running the Add-Computer PowerShell to domain join a node:

Add-Computer : Computer 'S2D-Node03' failed to join domain 'Corp.Domain.Com from its current
workgroup 'WORKGROUP' with following error message: The parameter is incorrect.
At line:1 char:1
+ Add-Computer -Domain Corp.Domain.Com -Credential Corp\DomainAdmin -OUPath  …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     + CategoryInfo          : OperationStopped: (S2D-Node03:String) [Add-Computer], InvalidOperation
    Exception
     + FullyQualifiedErrorId : FailToJoinDomainFromWorkgroup,Microsoft.PowerShell.Commands.AddComp
    uterCommand

The PowerShell line it's complaining about is this one:

Add-Computer -Domain Corp.Domain.Com -Credential Corp\DomainAdmin -OUPath "OU=S2D-OpenNodes,OU=S2D-Clusters,DC=Corp,DC=Domain,DC-Com" -Restart

Do you see it ? ;)

The correct PoSh for this step is actually:

Add-Computer -Domain Corp.Domain.Com -Credential Corp\DomainAdmin -OUPath "OU=S2D-OpenNodes,OU=S2D-Clusters,DC=Corp,DC=Domain,DC=Com" -Restart

When specifying the OUPath option if there is any typo in that setting the nondescript error is "The parameter is incorrect."

We always prefer to drop a server or desktop right into their respective OU containers as that allows our Group Policy settings to take giving us full access upon reboot and more.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Monday, 6 August 2018

Cloud Hosting Architecture: Tenant Isolation

Cloud Vendors Compromised

Given the number of backchannels we are a part of we get to hear horror stories where Cloud Vendors are compromised in some way or get hit by an encryption event that takes their client/customer facing systems out.

When we architect a hosting system for a hosting company looking to deploy our solutions in their hosting setup, or to set up an entirely new hosting project, there are some very important elements to our configuration that would help to prevent the above from happening.

A lot of what we have put into our design is very much a result of our experiences on the frontlines with SMB and SME clients.

One blog post that provides some insight: Protecting a Backup Repository from Malware and Ransomware.

It is absolutely critical to isolate and off-site any and all backups. We've also seen a number of news items of late where a company is completely hosed as a result of an encryption event or other failure only to find out the backups were either wiped by the perps or no good in the first place.

Blog Post: Backups Should Be Bare Metal and/or Virtually Test Restored Right?

The full bare metal or virtual restore is virtually impossible at hyper-scale. Though, we've seen that the backups being done in some hyper-scale cloud vendor's environments have proven to be able to be restored while in others a complete failure!

However, that does not excuse the cloud customer or their cloud consultancy from making sure that any and all cloud based services are backed up _off the cloud_ and air-gapped as a just-in-case.

Now, to the specific point of this blog post.

Tenant Isolation Technique

When we set up a hosting solution we aim to provide maximum security for the tenant. That's the aim as they are the ones that are paying the bills.

To do that, the hosting company needs to provide a series of layered protections for tenant environments.

  1. Hosting Company Network
    • Hosting company AD
    • All hosting company day-to-day operations
    • All hosting company on-premises workloads specific to company operations and business
    • Dedicated hosting company edges (SonicWALL ETC)
  2. Tenant Infrastructure Network
    • Jump Point for managing via dedicated Tenant Infrastructure AD
    • High Availability (HA) throughout the solution stack
    • Dedicated Tenant Infrastructure HA edges
      • Risk versus Reward: Could use the above edges but …
    • Clusters, servers, and services providing the tenant environment
    • Dedicated infrastructure switches and edges
    • As mentioned, backups set up and isolated from all three!
  3. Tenant Environment
    • Shared Tenant AD is completely autonomous
    • Shared Tenant Resources such as Exchange, SQL, and more are appropriately isolated
    • Dedicated Tenant AD is completely autonomous
    • Dedicated Tenant Resources such as Exchange, SQL, and more are completely isolated to the tenant
    • Offer a built-in off-the-cloud backup solution

With the solution architected in this manner we protect the boundaries between the Hosting Company Network and the Tenant Environment. This makes it extremely difficult for a compromise/encryption event to make the boundary traversal without some sort of Zero Day involved.

Conclusion

We've seen a few encryption events in our own cloud services tenants. None of them have traversed the dedicated tenant environments they were a part of. None. Nada. Zippo.

Containment is key. It's not "if" but "when" an encryption event happens.

Thus, architecting a hosting solution with the various environment boundaries in mind is key to surviving an encryption event and looking like a hero when the tenant's data gets restored post clean-up.

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.commodityclusters.com
Our Web Site
Our Cloud Service

Thursday, 26 July 2018

Hypervisor, Cluster, and Server Hardware Nomenclature (A quick what's what)

100 Level Post

When helping folks out there seems to be a bit of confusion on what means what when it comes to discussing the software or hardware.

So, here are some definitions to help clear the air.

  • NIC
    • Network Interface Card
    • The card can have one, two, four, or more ports
    • Get-NetAdapter
    • Get-NetLbfoTeam
  • Port
    • The ports on the NIC
  • pNIC
    • pNIC = NIC
    • A physical NIC in a hypervisor host or cluster node
  • vNIC
    • The virtual NIC in a Virtual Machine (VM)
    • In-Guest: Get-NetAdapter
    • In-Guest: Get-NetIPAddress
  • vSwitch
    • The Virtual Switch attached to a vNIC
    • Get-VMSwitch
  • Gb
    • Gigabit =/= Gigabyte (GB)
    • 1 billion bits
  • GB
    • Gigabyte =/= Gigabit (Gb)
    • 1 billion bytes
  • 10GbE
    • 10 Gigabit Ethernet
    • Throughput @ line speed ~ 1GB/Second (1 Gigabyte per Second)
  • 100GbE
    • 100 Gigabit Ethernet
    • Throughput @ line speed ~ 10GB/Second (10 Gigabytes per Second)
  • pCore
    • A physical Core on a CPU (Central Processing Unit)
  • vCPU
    • A virtual CPU assigned to a VM
    • Is _not_ a pCore or assigned to a specific pCore by the hypervisor!
    • Please read my Experts-Exchange article on Hyper-V especially the Virtual CPUs and CPU Cores section mid-way down it's free to access
    • Set-VMProcessor VMNAME -Count 2
  • NUMA
    • Non-Uniform Memory Access
    • A Memory Controller and the physical RAM (pRAM) attached to it is a NUMA node

A simple New-VM PowerShell script is here. This is our PowerShell Guide Series that has a number of PowerShell and CMD related scripts . Please check them out and back every once in a while as more scripts are on the works.

Think something should be in the above list? Please comment or feel free to ping us via email.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.commodityclusters.com
Our Web Site
Our Cloud Service

Tuesday, 24 July 2018

Mellanox SwitchX-2 MLNX-OS Upgrade Stall via WebGUI and MLNX-OS to Onyx Upgrade?

Yesterday, we posted about our OS update process and the grids that indicated the proper path to the most current version.

A catch that became apparent was that there were two streams of updates available to us:

  1. image-PPC_M460EX-3.6.4112.img
  2. onyx-PPC_M460EX-3.6.4112.img
    • ETC
    • image

As can be seen in the snip above we read an extensive number of Release Notes (RN) and User Manuals (UM) trying to figure out what was what and which was which. :S

In the end, we opened a support ticket with Mellanox to figure out why our switches were stalling on the WebGUI upgrade process and why there was a dearth of documentation indicating anything about upgrade paths.

The technician mentioned that we should use CLI to clean-up any image files that may be left over. That's something we've not had to do before.

Following the process in the UM to connect via SSH using our favourite freebie tool TeraTerm we connected to both switches and found only one file to delete:

  • WebImage.tbz

Once that file was deleted we were able to initiate the update from within the WebGUI without error on both switches.

Since we had MLNX-OS 3.6.4112 already installed the next question for the tech was, "How do we get to the most current version of Onyx?"

The process was as follows:

  1. Up to MLNX-OS 3.6.4112
  2. Up to Onyx 3.6.6000
  3. Up to Onyx 3.6.8008

As always, check out the Release Notes (RN) to make sure that the update will not cause any problems especially with in-production NICs and their firmware!

image

Happy News! Welcome to Onyx

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Monday, 23 July 2018

Mellanox SwitchX-2 and Spectrum OS Update Grids

We're in the process of building out a new all-flash based Kepler-64 2-node cluster that will be running the Scale-Out File Server Role. This round of testing will have several different rounds to it:

  1. Flat Flash Intel SSD DC S4500 Series
    • All Intel SSD DC S4500 Series SATA SSD x24
  2. Intel NVMe PCIe AIC Cache + Intel SSD DC S4500
    • Intel NVMe PCIe AIC x4
    • Intel SSD DC S4500 Series SATA SSD x24
  3. Intel Optane PCIe AIC + Intel SSD DC S4500
    1. Intel Optane PCIe AIC x4
    2. Intel SSD DC S4500 Series SATA SSD x24

Prior to running the above tests we need to update the operating system on our two Mellanox SwitchX-2 MSX1012B series switches as we've been quite busy with other things!

Their current OS level is 3.6.4006 so just a tad bit out of date.

image

The current OS level for SwitchX-2 PPC switches is 3.6.8008. And, as per the Release Notes for this OS version we need to do a bit of a Texas Two-Step to get our way up to current.

image

image

Now, here's the kicker: There is no 3.6.5000 on Mellanox's download site. The closest version to that is 3.6.5009 which provides a clarification on the above:

image

Okay, so that gets us to 3.6.5009 that in turn gets us to 3.6.6106:

image

And that finally gets us to 3.6.8008:

image

Update Texas Two Step

To sum things up we need the following images:

  1. 3.6.4122
  2. 3.6.5009
  3. 3.6.6106
  4. 3.6.8008

Then, it's a matter of time and a bit of patience to run through each step as the switches can take a bit of time to update.

image

A quick way to back up the configuration is to click on the Setup button then Configurations then click the initial link.

image

Copy and paste the output into a TXT file as it can be used to reconfigure the switch if need-be via the Execute CLI commands window just below it.

As always, it pays to read that manual eh! ;)

NOTE: Acronym Finder: AIC = Add-in Card so not U.2.

Oh, and be patient with the downloads as they are _slow_ as molasses in December as of this writing. :(

image

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Friday, 29 June 2018

Our Calgary Oil & Gas Show Booth & Slide Show

At the invitation of one of our suppliers, AVNET, I got to spend the day manning a spot in their booth.

image

Calgary International Oil & Gas Show 2018 AVNET Booth

Sitting on the table at the left is one of our Kepler-47 nodes and a series of storage devices one of which is a disassembled hard drive.

There were great conversations to be had with the folks at the other booths including Intel, Kingston, and Microsoft and their Azure IoT team among others.

Thanks to AVNET and the team. They were very gracious. :)

Here's the slideshow I put together for that monitor on the wall.

image

image

image

image

image

image

image

image

image

image

image

image

Just a note on the mentioned Intel OmniPath setup. In conversation with Intel post-slide creation it seems that OPA is not a Windows focused architecture so there's no opportunity for us to utilize it in our solutions.

To our Canuck readers have a great long weekend and to everyone else have a great weekend. :)

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Web Site
Our Cloud Service

Tuesday, 1 May 2018

PowerShell Guide Series: Storage Spaces Direct PowerShell Node Published

Apologies for the double post, one of the bulleted links was broken. :(

One of the difficult things about putting our setup guides on our blog was the fact that when we changed them, which was frequent, it became a bit of a bear to manage.
So, we're going to be keeping a set up guides on our site to keep things simple.

The first of the series has been published here:

This guide is a walkthrough to set up a 2-Node Storage Spaces Direct (S2D) cluster node from scratch. There are also steps in there for configuring RoCE to allow for more than two nodes if there is a need.
We will be updating the existing guides on a regular basis but also publishing new ones as we go along.

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Web Site
Our Cloud Service

PowerShell Guide Series: Storage Spaces Direct PowerShell Node Published

One of the difficult things about putting our setup guides on our blog was the fact that when we changed them, which was frequent, it became a bit of a bear to manage.
So, we're going to be keeping a set up guides on our site to keep things simple.
The first of the series has been published here:
This guide is a walkthrough to set up a 2-Node Storage Spaces Direct (S2D) cluster node from scratch. There are also steps in there for configuring RoCE to allow for more than two nodes if there is a need.
We will be updating the existing guides on a regular basis but also publishing new ones as we go along.
Thanks for reading!
Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Web Site
Our Cloud Service

Tuesday, 23 January 2018

Storage Spaces Direct (S2D): Sizing the East-West Fabric & Thoughts on All-Flash

Lately we've been seeing some discussion around the amount of time required to resync a S2D node's storage after it has come back from a reboot for whatever reason.

Unlike a RAID controller where we can tweak rebuild priorities, S2D does not offer the ability to do so.

It is with very much a good thing that the knobs and dials are not exposed for this process.

Why?

Because, there is a lot more going on under the hood than just the resync process.

While it does not happen as often anymore, there were times where someone would reach out about a performance problem after a disk had failed. After a quick look through the setup the Rebuild Priority setting turned out to be the culprit as someone had tweaked it from its usual 30% of cycles to 50% or 60% or even higher thinking that the rebuild should be the priority.

S2D Resync Bottlenecks

There are two key bottleneck areas in a S2D setup when it comes to resync performance:
  1. East-West Fabric
    • 10GbE with or without RDMA?
    • Anything faster than 10GbE?
  2. Storage Layout
    • Those 7200 RPM capacity drives can only handle ~110MB/Second to ~120MB/Second sustained
The two are not the mutually exclusive culprit depending on the setup as they both can play together to limit performance.

The physical CPU setup may also come into play but that's for another blog post. ;)

S2D East-West Fabric to Node Count

Let's start with the fabric setup that the nodes use to communicate with each other and pass storage traffic along.

This is a rule of thumb that was originally born out of a conversation at a MVP Summit a number of years back with a Microsoft fellow that was in on the S2D project at the beginning. We were discussing our own Proof-of-Concept that we had put together based on a Mellanox 10GbE and 40GbE RoCE (RDMA over Converged Ethernet) fabric. Essentially, at 4-nodes a 40GbE RDMA fabric was _way_ too much bandwidth.

Here's the rule of thumb we use for our baseline East-West Fabric setups. Note that we always use dual-port NICs/HBAs.
  • Kepler-47 2-Node
    • Hybrid SSD+HDD Storage Layout with 2-Way Mirror
    • 10GbE RDMA direct connect via Mellanox ConnectX-4 LX
    • This leaves us the option to add one or two SX1012X Mellanox 10GbE switches when adding more Kepler-47 nodes
  • 2-4 Node 2U 24 2.5" or 12/16 3.5" Drives with Intel Xeon Scalable Processors
    • 2-Way Mirror: 2-Node Hybrid SSD+HDD Storage Layout
    • 3-Way Mirror: 3-Node Hybrid SSD+HDD Storage Layout
    • Mirror-Accelerated Parity (MAP): 4 Nodes Hybrid SSD+HDD Storage Layout
    • 2x Mellanox SX1012X 10GbE Switches
      • 10GbE RDMA direct connect via Mellanox ConnectX-4 LX
  • 4-7 Node 2U 24 2.5" or 12/16 3.5" Drives with Intel Xeon Scalable Processors
    • 4-7 Nodes: 3-Way Mirror: 4+ Node Hybrid SSD+HDD Storage Layout
    • 4+ Nodes: Mirror-Accelerated Parity (MAP): 4 Nodes Hybrid SSD+HDD Storage Layout
    • 4+ Nodes: Mirror-Accelerated Parity (MAP): All-Flash NVMe cache + SSD
    • 2x Mellanox Spectrum Switches with break-out cables
      • 25GbE RDMA direct connect via Mellanox ConnectX-4/5
      • 50GbE RDMA direct connect via Mellanox ConnectX-4/5
  • 8+ Node 2U 24 2.5" or 12/16 3.5" Drives with Intel Xeon Scalable Processors
      • 4-7 Nodes: 3-Way Mirror: 4+ Node Hybrid SSD+HDD Storage Layout
      • 4+ Nodes: Mirror-Accelerated Parity (MAP): 4 Nodes Hybrid SSD+HDD Storage Layout
      • 4+ Nodes: Mirror-Accelerated Parity (MAP): All-Flash NVMe cache + SSD
      • 2x Mellanox Spectrum Switches with break-out cables
        • 50GbE RDMA direct connect via Mellanox ConnectX-4/5
        • 100GbE RDMA direct connect via Mellanox ConnectX-4/5
    Other than the Kepler-47 setup we always have at least a pair of Mellanox ConnectX-4 NICs in each node for East-West traffic. It's our preference to separate out the storage traffic and the rest.

    All-Flash Setups

    There's a lot of talk in the industry about all-flash.

    It's supposed to solve the biggest bottleneck of them all: Storage!

    The catch is, bottlenecks are moving targets.




    Drop in an all-flash array of some sort and all of a sudden the storage to compute fabric becomes the target. Then, it's the NICs/HBAs on the storage _and_ compute nodes, and so-on.

    If you've ever changed a single coolant hose in an older high miler car you'd see what I mean very quickly. ;)

    IMNSHO, at this point in time, unless there is a very specific business case for all-flash and the fabric in place allows for all that bandwidth with virtually zero latency, all-flash is a waste of money.

    One business case would be for a cloud services vendor that wants to provide a high IOPS and vCPU solution to their clients. So long as the fabric between storage and compute can fully utilize that storage and the market is there the revenues generated should more than make up for the huge costs involved.

    Using all-flash as a solution to a poorly written application or set of applications is questionable at best. But, sometimes, it is necessary as the software vendor has no plans to re-work their applications to run more efficiently on existing platforms.

    Caveat: The current PCIe bus just can't handle it. Period.

    A pair of 100Gb ports on one NIC/HBA can't be fully utilized due to the PCIe bus bandwidth limitation. Plus, we deploy with two NICs/HBAs for redundancy.

    Even with the addition of more PCIe Gen 3 lanes in the new Intel Xeon Scalable Processor Family we are still quite limited in the amount of data that can be moved about on the bus.

    S2D Thoughts and PoCs

    The Storage Spaces Direct (S2D) hyper-converged or SOFS only solution set can be configured and tuned for a very specific set of client needs. That's one of its beauties.

    Microsoft remains committed to S2D and its success. Microsoft Azure Stack is built on S2D so their commitment is pretty clear.

    So is ours!

    Proof-of-Concept (PoC) Lab
    S2D 4-Node for Hyper-Converged and SOFS Only
    Hyper-V 2-Node for Compute to S2D SOFS
    This is the newest edition to our S2D product PoC family:
    Kepler-47 S2D 2-Node Cluster

    The Kepler-47 picture is our first one. It's based on Dan Lovinger's concept we saw at Ignite Atlanta a few years ago. Components in this box were similar to Dan's setup.

    Our second generation Kepler-47 is on the way to being built now.
    Kepler-47 v2 PoC Ongoing Build & Testing

    This new generation will have an Intel Server Board DBS1200SPLR with an E3-1270v6, 64GB ECC, Intel JBOD HBA I/O Module, TPM v2, and Intel RMM. OS would be installed on a 32GB Transcend 2242 SATA SSD. Connectivity between the nodes will be Mellanox ConnectX-4 LX running at 10GbE with RDMA enabled.

    Storage in Kepler-47 v2 would be a combination of one Intel DC P4600 Series PCIe NVMe drive for cache, two Intel DC S4600 Series SATA SSDs for performance tier, and six HGST 6TB 7K6000 SAS or SATA HDDs for capacity. The PCIe NVMe drive will optional due it is cost.

    We already have one or two client/customer destinations for this small cluster setup.

    Conclusion

    Storage Spaces Direct (S2D) rocks!

    We've invested _a lot_ of time and money in our Proof-of-Concepts (PoCs). We've done so because we believe the platform is the future for both on-premises and data centre based workloads.

    Thanks for reading! :)

    Philip Elder
    Microsoft High Availability MVP
    MPECS Inc.
    Co-Author: SBS 2008 Blueprint Book
    Our Web Site
    Our Cloud Service

    Monday, 18 December 2017

    Cluster: Troubleshooting an Issue Using Failover Cluster Manager Cluster Events

    When we run into issues the first thing we can do is poll the nodes via the Cluster Events log in Failover Cluster Manager (FCM).

    1. Open Failover Cluster Manager
    2. Click on Cluster Events in the left hand column
    3. Click on Query
      • image
    4. Make sure the nodes are ticked in the Nodes: section
    5. In the Event Logs section:
      • Microsoft-Windows-Cluster*
      • Microsoft-Windows-FailoverClustering*
      • Microsoft-Windows-Hyper-V*
      • Microsoft-Windows-Network*
      • Microsoft-Windows-SMB*
      • Microsoft-Windows-Storage*
      • Microsoft-Windows-TCPIP*
      • Leave all defaults checked
      • OPTION: Hardware Events
    6. Critical, Error, Warning
    7. Events On
      • From: Events On: 2017-12-17 @ 0800
      • To: Events On: 2017-12-18 @ 2000
    8. Click OK
    9. Click Save Query As...
    10. Save it
      • Copy the resultant .XML file for use on other clusters
      • Edit the node value section to change the node designations or add more
    11. Click on Save Events As... in FCM to save the current list of events for further digging

    Use the Open Query option to get to the query .XML and tweak the dates for the current date and time, add specific Event IDs that we are looking for, and then click OK.

    We have FCM and Hyper-V RSAT installed on our cluster's physical DC by default.

    Philip Elder
    Microsoft High Availability MVP
    MPECS Inc.
    Co-Author: SBS 2008 Blueprint Book
    Our Web Site
    Our Cloud Service

    Saturday, 9 December 2017

    PowerShell TotD: Hyper-V Live Move a specific VHDX file

    There are times when we need to move one of two VHDX files associated with a VM.

    The following is the PowerShell to do so:

    Poll Hyper-V Host/Node for VM HDD Paths

    get-vm "*" | Select *path,@{N="HDD";E={$_.Harddrives.path}} | FL

    Move a Select VHDX

    Move-VMStorage -VMName VMName -VHDs @(@{"SourceFilePath" = "X:\Hyper-V\Virtual Hard Disks\VM-LALoB_D0-75GB.VHDX"; "DestinationFilePath" = "Y:\Hyper-V\Virtual Hard Disks\VM-LALoB_D0-75GB.VHDX"})

    Move-VMStorage Docs

    The Move-VMStorage Docs site. This site has the full syntax for the PowerShell command.

    Conclusion

    While the above process can be initiated in the GUI, PowerShell allows us to initiate a set of moves for multiple VMs. This saves on time bigtime versus mouse.

    By the way, TotD means: Tip of the Day.

    Thanks for reading! :)

    Philip Elder
    Microsoft High Availability MVP
    MPECS Inc.
    Co-Author: SBS 2008 Blueprint Book
    Our Web Site
    Our Cloud Service

    Thursday, 9 November 2017

    Intel Server System R2224WFTZS Integration & Server Building Thoughts

    We have a brand new Intel Server System R2224WFTZS that is the foundation for a mid to high performance virtualization platform.

    image

    Intel Server System R2224WFTZS 2U

    Below it sits one of our older lab Intel Server System SR2625URLX 2U. Note the difference in the drive caddy.

    That change is welcome as the caddy no longer requires a screwdriver to set the drive in place:

    image

    Intel 2.5" Tooless Drive Caddy

    What that means is the time required to get 24 drives installed in the caddies went from half an hour or more to five or ten minutes. That, in our opinion, is a great leap ahead!

    The processors for this setup are Intel Xeon Gold 6134s with 8 cores running at 3.2GHz with a peak of 3.7GHz. We chose the Gold 6134 as a starting place as most of the other CPUs have more than eight cores thus pushing up the cost of licensing Microsoft Windows Server Standard or Datacenter.

    image

    Intel Xeon Gold 6134, Socket, Heatsink, and Canadian Loonie $1 Coin

    The new processors are huge!

    The scale difference between the E3-1200 series, E5-2600 series is orders of magnitude larger. The jump in size reminds me of the Pentium Pro's girth next to the lesser desktop/server processors of the day.

    image

    Intel Xeon Processor E3-1270 sits on the Intel Xeon Gold 6134

    The server is nearly complete.

    image

    Intel Server System R2224WFTZS Build Complete

    Bill of Materials

    In this setup the server's Bill of Materials (BoM) is as follows:

    • (2) Intel Xeon Gold 6134
    • 384GB via 12x 32GB Crucial DDR4 LRDIMM
    • Intel Integrated RAID Module RMSP3CD080F with 7 Series Flash Cache Backup
    • Intel 12Gbps RAID Expander Module RES3TV360
    • (2) 150GB Intel DC S3520 M.2 SSDs for OS
    • (5) 1.9TB Intel DC S4600 SATA SSDs for high IOPS tier
    • (19) 1.8TB Seagate 10K SAS for low to mid IOPS tier
    • Second Power Supply, TPM v2, and RMM4 Module

    It's important to note that when setting up a RAID controller instead of a Host Bus Adapter (HBA) that does JBOD only we require the flash cache backup module. In this particular unit one needs to order the mounting bracket: AWTAUXBBUBKT

    I'm not sure why we missed that, but we've updated our build guides to reflect the need for it going forward.

    One other point of order is the rear 2.5" hot swap drive bay kit (A2UREARHSDK2) does not come installed from the factory in the R2224WFTZS as it did in the R2224WTTYS. I'm still not sold on M.2 for the host operating system as they are not hot swap capable. That means, if one dies we have to down a node in order to change it. With the rear hot swap bay we can do just that, swap out the 2.5" SATA SSD that's being used for the host OS.

    For the second set of two 10GbE ports we used an Intel X540-T2 PCIe add-in card as the I/O modules are not in the distribution channel as of this writing.

    NOTE: One requires a T30 hex screwdriver for the heatsinks! After installing the processor please make sure to start all four nuts prior to tightening. As a suggestion, from there snug each one up gradually starting with the two middle nuts then the outer nuts similar to the process for installing a head on an engine block. This process provides an even amount of pressure from the middle of the heatsink outwards.

    Firmware Notes

    Finally, make sure to update the firmware on all components before installing an operating system. There are some key fixes in the motherboard firmware updates as of this writing (BIOS 00.01.0009 ReadMe). Please make sure to read through to verify any caveats associated with the update process or the updates themselves.

    Next up on our build process will be to update all firmware in the system, install the host operating system and drivers, and finally run a burn-in process. From there, we'll run some tests to get a feel for the IOPS and throughput we can expect from the two RAID arrays.

    Why Build Servers?

    That's got to be the burning question on some minds. Why?

    The long and the short of it is because we've been doing so for so many years it's a hard habit to kick. ;)

    Actually, the reality is much more mundane. We continue to be actively involved in building out our own server solutions for a number of reasons:

    • We can fine tune our solutions to specific customer needs
      • Need more IOPS we can do that
      • Need more throughput we can do that
      • Need a blend of the two as is the case here, then we can do that too.
    • Direct contact with firmware issues, interoperability, and stability
      • Making the various firmware bits play nice together can be a challenge
    • Driver issues, interoperability, and stability
      • Drivers can be quite finicky about what's in the box with them
    • Hardware interoperability
      • Our parts bin is chalk full of parts that refused to work with one another
      • On the other hand our solution sets are known good configurations
    • Cost
      • Our server systems are a fraction of the cost of Tier 1
    • Overall system configuration
      • As Designed Stability out of the box
    • He said She said
      • Since we test our systems extensively prior to deploying we know them well
      • Software Vendors that point the finger have no leg to stand on as we have plenty of charts and graphs
      • Performance issues are easier to pinpoint in software vendor's products
      • We remove the guesswork around an already configured Tier 1 box

    Business Case

    The business case is fairly simple: There are _a lot_ of folks out there that do not want to cloud their business. We help customers with a highly available solution set and our business cloud to give them all of the cloud goodness but keep their data on-premises.

    We also help I.T. Professional Shops who may not have the skill-set on board that have customers with a need for High Availability and a cloud like experience but want the solution deployed on-premises.

    For those customers that do want to cloud their business we have a solution set for the Small to Medium I.T. Shops that want to provide multi-tenant solutions in their own data centres. We provide the solution and backend support at a very reasonable cost while they spend their time selling their cloud.

    All in all, we've found ourselves a number of different great little niches for our highly available solutions (clusters) over the last few years.

    Thanks for reading! :)

    Philip Elder
    Microsoft High Availability MVP
    MPECS Inc.
    Co-Author: SBS 2008 Blueprint Book
    Our Web Site
    Our Cloud Service
    Twitter: @MPECSInc