Monday 3 December 2018

A word of caution: Verify! Everything!

We get to work with a whole host of clients and their industries but as a contractor we also get to work with a wide variety of IT Pros and IT Companies.

Many times we get involved in a situation that's an outright pickle.

Something has gone sideways and the caller is looking for some guidance, some direction, and a bit of handholding because they are at a loss.

Vendor Blame Game

Some of those times the caller is in the middle of a tongue wagging session between a set of vendors blaming the other for the pickle.

We were in that situation back in the day when a Symantec BackupExec (BUE) solution failed to restore. The client site had two HP DAT tape libraries with BUE firing "All Good" reports.

We found out those reports were bad when the client's main file server went blotto.

We were in between the storage vendor and their products, Symantec, and HP. It was not a pretty scene at all. In the end, it was determined _by us_ that BUE was the source of the problem because it did not do any kind of verify on the backups being written to tape despite the setting being there to do so.

We were fortunate that we had multiple redundant systems in place and managed to get most of the data back except one partner's weeks' worth of work. We had to build out a new domain though.

So, why the blog post?

Because, it's still happening today.

Verify, Verify, and Verify Again

We _highly suggest_ verifying that all backup products and backup services are doing what they say they are doing.

If the service provider charges for a test failover then do it anyway. Charge the fee back to the client because once the process has run successful or not things are in a better place either way.

Never, _ever_ walk into a disaster recovery situation without ever having tested the products that are supposed to save the client's business. Period.

Yeah, there are times where something may happen before that planned failover. That's not what we're talking about here.

What we are after is testing to make sure that the vendor's claims are indeed true and that the solution set is indeed working as planned.

The last place we need to find out that our client's backups are _not_ working is when their servers, virtual machines, cloud services vendors have gone blotto.

Out of the Cloud Backup

We always make sure we have a way to back up any cloud vendor's services to our client's site. It just makes sense.

Our trust is a very fickle thing.

When it comes to our client's data we don't give our full trust to any vendor or solution set.

We _always_ test the backup and recovery processes so that we're not blindsided by things not going as planned or any "hidden fees" for accessing the client's data in a disaster recovery situation.

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book !
Our Web Site
Our Cloud Service


Shayne Kawalilak said...

I got a piece of advice many years ago... you haven't got a successful backup until you have restored it. I haven't considered a client's data from cloud providers though. I will spend some time thinking about this but I have clients who basically run their entire business in the cloud. if something were to happen to that data it would be very bad and blaming some cloud provider would not make them look at me any different. I still won't be able to use the data until the vendor is back up but if they had a critical failure at least I could restore my data when they came back online.

Thanks for the wake up call.

Philip Elder Cluster MVP said...

There are plenty of cloud vendors, known both publicly and via back channels here, that have had catastrophic data loss and even full service loss situations.

Unfortunately, there's no law in place to force cloud vendors to reveal when they've had it happen.

This Cloud Hosting Architecture article explains a lot.