Tuesday, 15 January 2019

Custom Intel X299 Workstation: Intel VROC RAID 1 NVMe WinSat Disk Score

We just finished a custom build for a client of ours in the US.

The machine is extremely fast but quiet.

image

After kicking the tires a bit with Windows 10 Pro 64-bit and some software installs post burn-in we get the following performance out of the Intel NVMe RAID 1 pair:

C:\Temp>winsat disk
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: Storage Assessment '-ran -read -n 0'
> Run Time 00:00:00.77
> Running: Storage Assessment '-seq -read -n 0'
> Run Time 00:00:02.38
> Running: Storage Assessment '-seq -write -drive C:'
> Run Time 00:00:01.64
> Running: Storage Assessment '-flush -drive C: -seq'
> Run Time 00:00:00.45
> Running: Storage Assessment '-flush -drive C: -ran'
> Run Time 00:00:00.38
> Dshow Video Encode Time                      0.00000 s
> Dshow Video Decode Time                      0.00000 s
> Media Foundation Decode Time                 0.00000 s
> Disk  Random 16.0 Read                       1020.73 MB/s          8.8
> Disk  Sequential 64.0 Read                   3203.52 MB/s          9.3
> Disk  Sequential 64.0 Write                  1456.24 MB/s          8.8
> Average Read Time with Sequential Writes     0.090 ms          8.8
> Latency: 95th Percentile                     0.146 ms          8.9
> Latency: Maximum                             0.316 ms          8.9
> Average Read Time with Random Writes         0.058 ms          8.9
> Total Run Time 00:00:05.91

The machine is destined for a surveying company that's getting into high end image and video work with drones.

All in all, we are very happy with the build and we're sure they will be too!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Friday, 11 January 2019

Some Thoughts on the S2D Cache and the Upcoming Intel Optane DC Persistent Memory

Intel has a very thorough article that explains what happens when the workload data volume on a Storage Spaces Direct (S2D) Hyper-Converged Infrastructure (HCI) cluster starts to "spill over" to the capacity drives in a NVMe/SSD Cache with HDD Capacity for storage.

Essentially, any workload data that needs to be shuffled over to the hard disk layer will suffer a performance hit and suffer it big time.

In a setup where we would have either NVMe PCIe Add-in Cards (AiCs) or U.2 2.5" drives for cache and SATA SSDs for capacity the performance hit would not be as drastic but it would still be felt depending on workload IOPS demands.

So, what do we do to make sure we don't shortchange ourselves on the cache?

We baseline our intended workloads using Performance Monitor (PerfMon).

Here is a previous post that has an outline of what we do along with links to quite a few other posts we've done on the topic: Hyper-V Virtualization 101: Hardware and Performance

We always try to have the right amount of cache in place for the workloads of today but also with the workloads of tomorrow across the solution's lifetime.

S2D Cache Tip

TIP: When looking to set up a S2D cluster we suggest running with a higher count smaller volume cache drive set versus just two larger capacity drives.

Why?

For one, we get a lot more bandwidth/performance out of three or four cache devices versus two.

Secondly, in a 24 drive 2U chassis if we start off with four cache devices and lose one we still maintain a decent ratio of cache to capacity (1:6 with four versus 1:8 with three).

Here are some starting points based on a 2U S2D node setup we would look at putting into production.

  • Example 1 - NVMe Cache and HDD Capacity
    • 4x 400GB NVMe PCIe AiC
    • 12x xTB HDD (some 2U platforms can do 16 3.5" drives)
  • Example 2 - SATA SSD Cache and Capacity
    • 4x 960GB Read/Write Endurance SATA SSD (Intel SSD D3-4610 as of this writing)
    • 20x 960GB Light Endurance SATA SSD (Intel SSD D3-4510 as of this writing)
  • Example 3 - Intel Optane AiC Cache and SATA SSD Capacity
    • 4x 375GB Intel Optane P4800X AiC
    • 24x 960GB Light Endurance SATA SSD (Intel SSD D3-4510 as of this writing)

One thing to keep in mind when it comes to a 2U server with 12 front facing 3.5" drives along with four or more internally mounted 3.5" drives is their heat and available PCIe slots. Plus, the additional drives could also place a constraint on the processors that are able to be installed also due to thermal restrictions.

Intel Optane DC Persistent Memory

We are gearing up for a lab refresh when Intel releases the "R" code Intel Server Systems R2xxxWF series platforms hopefully sometime this year.

That's the platform Microsoft set an IOPS record with set up with S2D and Intel Optane DC persistent memory:

We have yet to see to see any type of compatibility matrix as far as the how/what/where Optane DC can be set up but one should be happening soon!

It should be noted that they will probably be frightfully expensive with the value seen in online transaction setups where every microsecond counts.

TIP: Excellent NVMe PCIe AiC for lab setups that are Power Loss Protected: Intel SSD 750 Series

image

Intel SSD 750 Series Power Loss Protection: YES

These SSDs can be found on most auction sites with some being new and most being used. Always ask for an Intel SSD Toolbox snip of the drive's wear indicators to make sure there is enough life left in the unit for the thrashing it would get in a S2D lab! :D

Acronym Refresher

Yeah, gotta love 'em! Being dyslexic has its challenges with them too. ;)

  • IOPS: Inputs Outputs per Second
  • AiC: Add-in Card
  • PCIe: Peripheral Component Interconnect Express
  • NVMe: Non-Volatile Memory Express
  • SSD: Solid-State Drive
  • HDD: Hard Disk Drive
  • SATA: Serial ATA
  • Intel DC: Data Centre (US: Center)

Thanks for reading!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Thursday, 10 January 2019

Server Storage: Never Use Solid-State Drives without Power Loss Protection (PLP)

Here's an article from a little while back with a very good explanation of why one should not use consumer grade SSDs anywhere near a server:

While the article points specifically to Storage Spaces Direct (S2D) it is also applicable to any server setup.

The impetus behind this post is pretty straight forward via a forum we participate in:

  • IT Tech: I had a power loss on my S2D cluster and now one of my virtual disks is offline
  • IT Tech: That CSV hosted my lab VMs
  • Helper 1: Okay, run the following recovery steps that help ReFS get things back together
  • Us: What is the storage setup in the cluster nodes?
  • IT Tech: A mix of NVMe, SSD, and HDD
  • Us: Any consumer grade storage?
  • IT Tech: Yeah, the SSDs where the offline Cluster Storage Volume (CSV) is
  • Us: Mentions above article
  • IT Tech: That's not my problem
  • Helper 1: What were the results of the above?
  • IT Tech: It did not work :(
  • IT Tech: It's ReFS's fault! It's not ready for production!

The reality of the situation was that there was live data sitting in the volatile cache DRAM on those consumer grade SSDs that got lost when the power went out. :(

We're sure that most of us know what happens when even one bit gets flipped. Error Correction on memory is mandatory for servers for this very reason.

To lose an entire cache worth across multiple drives is pretty much certain death for whatever sat on top of them.

Time to break-out the backups and restore.

And, replace those consumer grade SSDs with Enterprise Class SSDs that have PLP!

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service

Monday, 7 January 2019

Security: Direct Internet Connections for KVM over IP devices such as iLO Advanced, iDRAC Enterprise, Intel RMM, and others = BAD

While discussing firmware and updating firmware this situation, experienced a number of years ago vicariously, came to light:

ISP: Excuse me sir, but we have a huge volume of SPAM coming out of [WAN RMM IP]
Admin: Huh?
ISP: We are seeing huge volumes of SMTP traffic outbound from [WAN RMM IP]
Admin: Oh?
* Checks non-existent documentation
Admin: Um, is that IP assigned to us?
ISP: Yes sir, along with [WAN SSL IP for internal services]
Admin: Hmmm …
[PAUSE]
Admin: Oh, wait, I think I know … [unplugs iLO/iDRAC/RMM from switch connected to ISP modem]
Admin: Is it better now?
ISP: Oh yeah, what did you do?
Admin: Oh, I fixed it.

One should never plug a RMM/iLO/iDRAC type device directly into the Internet right?

We probably blogged about this in the past, but it definitely bears repeating as we still encounter situations where the devices are plugged directly in to the Internet!

Happy New Year everyone! :)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
www.s2d.rocks !
Our Web Site
Our Cloud Service