Saturday, 11 April 2009

EVGA 790i Ultra RAID Error – Drive Failure

We built this system for one of our clients a while back: EVGA 790i Ultra MB and Kingston RAM.

The system came back to us a couple of days ago due to a RAID array error message.

Nothing out of the ordinary was going on when the system spontaneously locked up on the user.

The RAID array consists of four (4) 150GB Western Digital Raptor drives configured in a RAID 1+0 (BIOS indicates 0+1 which is incorrect).

Of the two pairs, one in each managed to survive the array malfunction. Thus, the array and subsequently the OS survived the failure. We were able to boot into Windows Vista with no issues.

The puzzling aspect of getting things back together was how to reset the two failed drives.

We had taken the drives out of the system and ran the Western Digital utility that does an extensive disk scan and both came up error free.

The RAID BIOS does not make it too clear how we need to reset the failed array members either.

We ended up deleting the entries in the RAID BIOS array management for the two drives that were in error state. Once we did that, they appeared as available and we were able to add them back to the array by highlighting them and setting the array into rebuild mode. Remember, all we are doing is adding the second drive that will mirror the original drive in the RAID 1+0 array (nVidia calls it RAID 0+1 in the BIOS).

The catch to an array rebuild on this setup is that we need to boot into an OS before the rebuild starts.

When we went to reboot, Windows Vista complained about missing NLS data. So, something had changed with the resetting of the “failed” drives.

We booted into the ShadowProtect Recovery Environment to verify that everything was intact on the C: drive, which it was, but we were not able to get any further. We tried copying the NLS files over but that did not work.

So, we deleted the array in the RAID BIOS, recreated it, and attempted to restore our ShadowProtect image that we took just prior to messing around with the system. That restore attempt failed.

We finally learned the proper methodology for using ShadowProtect with Windows Vista for any backup/restore operation: First Successful Windows Vista ShadowProtect Restore! No Winload.exe Error! (previous blog post)

We had not run the preparation step indicated in the above blog post on the Windows Vista OS so we ended up needing to rebuild the box from scratch.

Once the box had been fully installed, one of the hard drive array members choked and caused the system to lock up. Perhaps we have finally found the culprit of the original array failure! We changed that drive out for a new drive using the array delete and create process and are now well into a burn-in process with no hiccups.

This time, we ran the above prep steps and then imaged the system using ShadowProtect before breaking the RAID array. We were able to then successfully restore the Windows Vista OS.

The failed drive in this case was not a part of the original pair of “failed” drives. Hopefully we now have the source of the problem. The burn-in will continue to run until our client picks up the box late Monday afternoon.

The burn-in produced no array failures and our client was happy with the system when we followed up at the end of last week.

Philip Elder
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*All Mac on SBS posts will not be written on a Mac until we replace our now missing iMac! (previous blog post)

Windows Live Writer

No comments: