Thursday 10 January 2019

Server Storage: Never Use Solid-State Drives without Power Loss Protection (PLP)

Here's an article from a little while back with a very good explanation of why one should not use consumer grade SSDs anywhere near a server:

While the article points specifically to Storage Spaces Direct (S2D) it is also applicable to any server setup.

The impetus behind this post is pretty straight forward via a forum we participate in:

  • IT Tech: I had a power loss on my S2D cluster and now one of my virtual disks is offline
  • IT Tech: That CSV hosted my lab VMs
  • Helper 1: Okay, run the following recovery steps that help ReFS get things back together
  • Us: What is the storage setup in the cluster nodes?
  • IT Tech: A mix of NVMe, SSD, and HDD
  • Us: Any consumer grade storage?
  • IT Tech: Yeah, the SSDs where the offline Cluster Storage Volume (CSV) is
  • Us: Mentions above article
  • IT Tech: That's not my problem
  • Helper 1: What were the results of the above?
  • IT Tech: It did not work :(
  • IT Tech: It's ReFS's fault! It's not ready for production!

The reality of the situation was that there was live data sitting in the volatile cache DRAM on those consumer grade SSDs that got lost when the power went out. :(

We're sure that most of us know what happens when even one bit gets flipped. Error Correction on memory is mandatory for servers for this very reason.

To lose an entire cache worth across multiple drives is pretty much certain death for whatever sat on top of them.

Time to break-out the backups and restore.

And, replace those consumer grade SSDs with Enterprise Class SSDs that have PLP!

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book !
Our Web Site
Our Cloud Service

No comments: