Thursday, 9 May 2013

Repeat after me: SATA does not belong in servers.

One of the very last servers we deployed with SATA drives had yet another failure in it.

There is a new Intel R2208GZ4GC 2U server in place with eight 600GB 10K SAS drives configured in a RAID 6 array already installed and waiting for tax season to slow down for them (they are an accounting firm).

image

Our client recently moved to a new location with the servers now located in a dedicated room in the basement. The little A/C unit in that room was a leftover from the previous occupant that we were not too sure about.

Well, the hot spare in this server, an Intel Server System SR1560SFHS with three 750GB Seagate ES series SATA drives, died about four months ago. Since the system was slated for replacement we left the remaining two in a RAID 1 array alone.

Well, that ended this morning with one of the drives in the pair having gone full stop. This was probably due to the fact that the temp in the room upon arrival this afternoon was close to 90F.

Someone had fired up the A/C unit without realizing that the hose that puts the heat outside was not connected to the back of the unit. Thus all of the heat it was trying to pull out plus its own heat yielded a very high temperature in that room.

Once the hose was affixed to the back of the unit the temperature started to come down.

So, here we are writing this blog post at 2216Hrs on a Wednesday evening after having logged in to check on the progress of the array rebuild and the above was what we saw.

The RAID controller is an Intel RAID Controller SRCSASRB with battery backup.

SATA does not belong in a server when it comes to spindled hard drives. This experience with the blind failure and the dismal rebuild times, during off hours no less, are definitely a part of it.

SAS/SCSI was designed and engineered to run in server environments. SATA was not.

The firmware tweaks that the hard drive vendors have introduced, along with the pretty much failed NCQ effort, to try and mimic a SAS setup within the SATA controller do not come close to the performance, longevity, and stability that SAS drives offer.

By the way, this goes for NearLine SAS drives as well. These drive types are SATA internals with SAS electronics slapped on to the external of the drive. There is a very good reason why the drives are called "NearLine". :)

The cost on 2.5" 10K SAS drives in 300GB and 600GB sizes have come down quite a bit in the last year. The 900GB 10K SAS drives are still relatively expensive per Gigabyte but provide an opportunity for a large aggregate of storage when needed.

Another way to look at it is this: How many RMA efforts have gone in to server setups with SATA drives in them? Compare that with the servers that have SAS setups. In our case, where we have lots of servers deployed, there is virtually no comparison. Over time the SAS drives have completely trumped the SATA drives in all aspects.

Even with 24x7x365 by 4 hour response times most vendors require time wasted on the phone prior to initiating that on-site visit to replace the failed drive. This time is expensive and to some extent a waste.

Oh, and one more thing: If going with parity in an array go RAID 6 with at least eight 10K spindles and make sure the RAID controller has either flash backed cache or a battery backup.

Storage is almost always the weakest point in a server both for hardware failures and I/O bottlenecks. Kill both. Use a wide array of eight spindles or more and make sure the drives 10K SAS.

The risk when using SATA is just not worth the "savings" IMNSHO (in my not so humble opinion).

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Chef de partie in the SMBKitchen
Find out more at
www.thirdtier.net/enterprise-solutions-for-small-business/

Windows Live Writer

3 comments:

Paul said...

Can't really blame the failed hard disk if the AC system has been incorrectly installed/operated.

I can genuinely say that I have had more SAS drives fail in servers that SATA. I don't know why, just the way it is.

Most of our SATA based servers have WD Raid Edition disks, perhaps it's the Seagate's that are the culprit? I only buy WD drives not due to high failure rates in seagates.

SAS v SATA will always e a big debate, price conscious clients will always go for SATA. I have one client with aout 6T of data (architects) and they couldn't afford SAS based servers.

As always, YMMV.
:)

Shayne Kawalilak said...

agree with you 100% but to expand I dont believe RAID 6 should be used unless absolutely necessary. RAID 10 in servers if possible and never RAID 5 (why IBM is still promoting something that Dell has stopped supporting is bewildering to me).

Philip Elder Cluster MVP said...

Paul,

Our experience has been the opposite. I can count on one hand the number of Seagate SAS drives we've deployed over the last three or four years now that have DOAd or died in production.

Yes, the 500GB ES series had that firmware bug. That really killed us.

We use WD Black for backup rotations and are not really happy with the death rates on them so far but are reasonable for our expectations from SATA.

On the whitebox side of things 600GB 10K Seagate SAS drives are very inexpensive. Tier 1 makes some of their bread and butter on storage so their prices are vastly more expensive.

We run with RAID 6 with a few exceptions in servers (small spindle sets would be RAID 5). We do RAID 10 on SAN/DAS setups but are leaning towards RAID 6 in these circumstances as well.

A number of storage vendors we work with and Cloud partners run with RAID 10 on their storage arrays as the hardware cost versus storage loss is a better risk at that level.

Philip