Okay, so this is a new one and lead to a near stoppage of the heart last evening:
The errors in order:
Controller ID: 0 Unrecoverable medium error during rebuild: PD –|—:0 Location 0x26f1640
Controller ID: 0 Puncturing bad block: PD –|—:0 Location 0x26f1640
Controller ID: 0 Puncturing bad block: PD –|—:1 Location 0x26f1640
PD 0 is the last original disk in this server that is giving us headaches. Both PD 1 and PD 2 (there are three hot swap bays in the SR1560SFH Intel Server System) were replaced with new drives.
The original PD 1 had failed during a server firmware including BMC update (previous blog post). The original PD 2, the then global hot spare, was rebuilt into the array with no errors . . . until a consistency check that ran later that afternoon produced some unrecoverable fatal errors.
Last night we dropped original PD 1 out of the configuration, replaced it with a new drive, had the new drive picked up as a hot spare. We then failed out the original hot spare PD 2 now RAID 1 array member assuming that it was the source of the errors we saw in the consistency check yesterday afternoon.
So, the above screenshot was taken after the new PD 1 was being rebuilt into the array with PD 0 as the source. Needless to say the heart definitely skipped a few beats with visions of index $0 running through my head (previous blog post)!
The rebuild did eventually finish successfully though?!?
We will be going back this evening to fail out the bad PD 0 and replace it with a new drive which will then be designated the new hot spare.
Once the PD 2, currently a hot spare, rebuild into the RAID 1 array has finished, Intel indicated to us that we need to run a consistency check. From there, hopefully ShadowProtect will finally give us a backup!
And one more thing, just what does “Puncturing bad block” really mean?
- Bing Search: Puncturing bad block
- Yields nada.
- Google Search: Puncturing bad block
- Yields: NEC MegaRAID Storage Manager Manual (PDF)
The suggestion in the above NEC linked document is to take the preventative measure and swap out the indicated drive(s) promptly. :)
It looks as though the RAID controller has found some bad sectors on the PD 0 and puncturing means to set those sectors as off limits on both array members.
But part of this whole puzzle is the fact that the RAID controller (Intel SRCSASRB with firmware 470) shows a media error level of 0 for both array members and a predictive failure count of 0 for both members!
Hopefully tomorrow we can rest easy with a backup in hand!
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book