Tuesday, 29 January 2013

Server Uptime at 16 Months

So, is this a good thing or a bad thing?

image

Yeah, this particular box has been up for 16 months.

In August of 2011 switch one of two failed. That switch just happened to be the one this particular box was plugged in to. Prior to that the box had been up and running since Server 2008 R2 was installed (just prior to R2’s RTM).

Its sole purpose in life is to serve Terabytes of data via a teamed Gigabit NIC pair.

Note the CPU setup. It has a pair of Intel Xeon 5130 CPUs and 16GB of RAM. Yeah, it’s a bit long in the tooth. ;)

Given the server’s role serving files we tend to leave it alone. No Web browsing, no desktop access, or any other access is needed unless something happens with it.

Well, today the backplane decided to hiccup along with the 500GB Seagate ES series drive that died in it (we’ve had a _huge_ fail rate on these drives over the last two or more years).

It is time for this old box to be retired.

Its replacement will be a Hyper-V Failover Cluster based on two or three Intel Server Systems SR1695GPRX2AC 1U servers with an Intel Xeon X3470 and 32GB of ECC RAM.

Is it a good thing for one to leave a box up and running for months or even years at a time?

Is the risk worth it?

In some cases we have no choice where three shifts run 24/7/365. Coordinated downtime is about the only way in to these boxes. Though SBS tends to start choking around the 90 day mark for both SBS 2008 Standard and SBS 2011 Standard so these reboots with patch cycles tend to happen every quarter.

To mitigate this situation we need to make sure we have good monitoring in place for edge access, AD authentication attempts (especially failures), proper edge configuration blocking both inbound and outbound packets by default, and other strategies like no touching the box/VM.

In the end, the risks need to be evaluated beside the benefits of no reboot cycles and/or no patch cycles for a lengthy amount of time.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Windows Live Writer

7 comments:

Jules Wilko said...

Those 500Gb ES drives - Are they the SAS ones?
I am sick to the back teeth of those failing on me...

I've lost 4 in the last 6 months, and one is just starting to report as predicted failure yesterday.

Grr!!

Jules Wilko said...

Those 500Gb ES drives - Are they the SAS ones?
I am sick to the back teeth of those failing on me...

I've lost 4 in the last 6 months, and one is just starting to report as predicted failure yesterday.

Grr!!

Anonymous said...

Patching's an odd thing. I'v had updates cause more problems than they are meant to solve! I now only patch (service packs excluded) if there is a specific threat or we have a little downtime coming up anyway. As always YMMV.

Paul

Philip Elder Cluster MVP said...

Jules,

These are Seagate Enterprise SATA series drives. They were pretty much the nail in the SATA in Servers coffin for us.

We stopped SATA in servers with a few 750GB ES SATA making their way into certain configurations. We moved to SAS only.

Philip

Philip Elder Cluster MVP said...

Paul,

We tend to run 30 to 60 days behind on patching as we test them in-house and keep our ears open for troubles.

Yeah, the Exchange and .NET folks have been particularly nasty with updates in recent months/years. :(

Philip

Jules Wilko said...

Oh well... I can only say that the SAS 500Gb makes no odds... They are the ones that are failing.
(Well, they are near line SAS)

Regards

Jules

Philip Elder Cluster MVP said...

Jules,

When SAS boards were starting to show up on SATA based internals we stepped back and waited.

Remember the IDE drives that had hybrid SATA setups on them? What a mess that was.

So, we stood by and waited for the SAS on SATA internal setups to mature.

Yeah, I don't doubt that the early nearline stuff was a bit buggy.

Philip