Tuesday, 21 December 2010

Primary A/C Breaker Fail – APC BR1500LCD and Intel SC5299BRP Dual PSU Fail

One of our clients had a primary breaker pop on Friday last week when an appliance was turned on while someone was running an oven’s self-clean cycle.

When the power failed it somehow managed to kill the APC BR1500LCD UPS protecting the server. The end result of the UPS being taken out was that both power supplies in the redundant PSU setup also failed.

The server was configured as follows:

  • Intel Xeon X3220 Series CPU
  • Intel S3210SHLX Server Board
  • 4GB RAM
  • Intel RAID Controller SRCSASRB + BBU
  • 500GB Seagate ES series drives in dual RAID 1 arrays.
  • Intel SC5299BRP with both PSUs.

Intel sent out a pair of replacement PSUs which fired up the hardware with no issues.

The BR1500LCD’s onboard display normally shows about 20 to 30 minutes of runtime on any given day. It now reads 1 minute so it will also be replaced.

We were fortunate that the spontaneous shutdown of SBS 2003 R2 Premium with ISA installed did not in some way mess up Exchange or Active Directory.

We ran a few reboot cycles and an update cycle checking to make sure that the Exchange database was happy in between each cycle which it was. We had their ShadowProtect backup on standby if there was a need to recover the server.

The final reboot happened in about 1/3 the time the first boot up after plugging in the new PSUs did. We will be keeping a close eye on this box to make sure everything stays happy.

It just goes to show that even with a good UPS in front of a system that includes line levelling that there are no guarantees.

Nothing replaces having a good backup that is known to recover without issues. The stress levels are so much lower when we know we have a fall back in the midst of a server down situation.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

3 comments:

stryqx said...

Even though the Back-UPS has a zero clamping response time it does have about 5% surge let-through. Depending on the size of the surge this can be enough to kill anything on the output circuit.

Something like a Smart-UPS has a much smaller surge let-through and also has a higher surge energy rating for the same sized UPS. Which is why you pay more for them as well. As you start increasing in VA size it's best to look at a decent UPS, rather than the toys.

Online UPSes tend to handle large surges better due to the AC-DC-AC conversion process and more circuitry in between the input power and the output power. Large surges tend to blow the input filters and rectifier and occasionally the inverter and battery charger.

Also redundant power supplies should always go to separate power sources. No point having your redundant power supplies hooked up to the same single point of failure.

And then there's always the old cliche - a fuse only exists to provide a circuit long enough to blow up whatever it's supposed to be protecting :-)

Philip Elder Cluster MVP said...

Chris,

The APC Smart-UPS SC 1500VA shows a 5% IEEE surge let-through. $549.00

The APC Smart-UPS 1500VA LCD SMT1500 shows a 0.3% IEEE surge let-through. $800

Now the BR1500LCD was about $329.00 so there is definitely a step up if we want the higher surge filtering capabilities. In this case our client is a non-profit and the extra $500 would not be there.

But, this is definitely something to look at going forward.

Thanks,

Philip

Paul said...

Ah, UPS'! I have more trouble with UPS' than power failures. I only use APC too.