Wednesday, 2 July 2008

750GB Seagate ES.2 failure on SR1530AH plus SRCSASRB

Our Server Core system running Hyper-V on Server Core has had a drive failure.

The culprit:
  • 750GB Seagate Barracuda ES.2
    • ST3750330NS
    • Firmware: SN04
    • Date Code: 08277
    • Site Code: KRATSG
The failed drive is in the same batch as the last one: Seagate 750GB ES.2 Failure on SR1530AHLX swap experience. The replacement drive is also in this batch. This is the third 750GB ES.2 drive failure we have had in recent weeks. Given our volume of drives, this may not be indicative of a general batch problem, but, if we keep experiencing drive failures out of this batch then it may be safe to assume that we are probably dealing with a batch problem.

This particular server setup is in an Intel SR1530AH 1U with an Intel SRCSASRB add-in RAID controller installed. The drives were setup in a RAID 1 mirror in a fixed hard drive install. The 1U was installed in one of our rack enclosures in our shop data centre setup. The air is properly conditioned, so heat would not be a factor in this failure.

Due to the nature of Server Core and the fact that Intel's RAID Web Console Utility for Windows does not support being installed on Server Core installations, we were in a position where we needed to reboot the server into the RAID controller's BIOS in order to remove the now dead drive and configure its replacement.

With the inability to run the RAID Web Console on Server Core, the server will remain down until the RAID array rebuild completes. It does not look as though the array rebuild requires the server to be down during the rebuild process, but given the disk intensive nature of Hyper-V with multiple VMs running, we will let it alone for now.

The RAID Web Console does install on our full Windows Server 2008 installations, so, for those clients that require the least amount of downtime, a hot swap setup would require the full version of Server 2008 versus Server Core for their Hyper-V needs. This may involve a slight adjustment in server configurations to accommodate the additional resource requirements of the full Windows Server 2008 OS.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists

*All Mac on SBS posts are posted on our in-house iMac via the Safari Web browser.

7 comments:

Anonymous said...

I hope the two drives that failed were due to unusual Canadian weather patterns, and not a batch problem cause...

Two months ago, I installed an SBS 2k3 R2 RAID 1 config using the same 750GB drives in conjunction with a 2nd RAID 1 setup in the same server using their 500GB little brothers.

I don't have the drive label codes to match against yours, but wondering if you have since performed a drive analysis or read the SMART data from the drive, which is what I would do. Just wondering what you might find.

Now I'm gonna have to keep a hawkeye on those drives.

Thanks, Philip, for the heads up!

ZC1
Beta tester of "0"s and "1"s

Anonymous said...

Well, the followup news is NOT good. After being alerted to Philips' problem with these drives, I checked the installation where I had used these drives.

Seems the SBS system event log is throwing out "ftdisk" and "disk" errors (Event ID 51) on one of the drives in the RAID 1 configuration.

The application log confirms this with various errors in corrupt data and data loss.

Although it's not affecting performance, I've ordered a same type replacement drive.
I've read the newest version of the drive is 7200.11.

I was pretty sure of these drives after an intense 72hr burn-in. Evidently something is amiss.

ZC1
Beta tester of "0"s and "1"s

Philip Elder Cluster MVP said...

ZC1,

The drives we have been working with are ES.2 which are the Seagate Enterprise Storage series.

7200.11 is the Barracuda desktop series.

Have a look at my NTFS 51 posts ... they are ugly.

Make sure you have a grandfather backup that is good prior to the errors happening.

If you are using a snapshot type backup like ShadowProtect or Acronis then your backups may very likely be toast after the errors appeared ... as was the case for us.

Also, to eliminate any possible hiccups in the future, you may want to consider Swinging off and back on the box with a fresh install to eliminate any possiblity of

Keep me up to date!

Philip

Anonymous said...

Philip,

I stand corrected.

You are correct, they are ES.2 drives, not the 7200.11. My bad.

I will look into the Swing method of repair into a new box, but..

It is simply services that are squawking "I'm hurt...I've fallen down and can't get up".

The customer data is not affected at this time and the RAID 1 continues to operate without error, strangely.

Thus far, it looks like only a simple drive replacement and the RAID 1 will simply rebuild.

ZC1
Beta tester of "0"s and "1"s

Philip Elder Cluster MVP said...

Suggestion:

ShadowProtect image and do a restore to a test box using HIR if necessary.

Take that Exchange Store offline and run Eseutil to attempt a defragment on the database.

You may be surprised with what you find there.

The major crash under similar circumstances we had a while back essentially killed the backup images for complete recovery of the SBS box.

But they at least gave us our Exchange Store back, although not in optimal condition.

We have the most recent backup from last night and will be recovering it to an identical box we happen to have ready here for another client.

We will be experimenting on whether we can defragment and eliminate the errors in the database, or looking at other options to preserve their mailboxes.

Philip

Anonymous said...

I have had five new ES.2 SATA-300Seagates fail in Intel motherboard RAID 1, Windows 2003SBS and Windows 2008 servers in the last year. All failed within a month or two of service. All purchased from Tech Data as non-retail drives. None worked on testing, to include scan with R-Tools, although they were spinning without noise. I was assuming it was then drive daughterboard failures. And one of the replacements failed within a month. I wish Maxtor was still in business.... those IDE and SATA drives ran in my servers and workstations for years with minimal problems. I had left Western Digital in mid 1990s due to increasing number of failures in servers... now I may have to go back and give them a try. Seagate SCSI and SAS drives all work well in my servers and storage arrays. But Seagate SATA line, at least the ES2, SUCKS, as I approach over a 50% fail rate in first 6 months of use.!!! I notice ES.11 are availabe in retail.. might try some of those.

Philip Elder Cluster MVP said...

A,

Perhaps this will help: Firmware Recommendations for Seagate Drives.

It seems that there have been some firmware issues with specific Seagate Barracuda drives.

If you are affected, you may be able to swap the daughter boards with a known good working model of same type to get to the data. It is a practice we have used in the past.

Philip