Thursday, 14 July 2016

Hyper-V and Cluster Important: A Proper Time Setup

We’ve been deploying DCs into virtual settings since 2008 RTM. There was not a lot of information on virtualizing anything back then. :S

In a domain setting having time properly synchronized is critical. In the physical world the OS has the on board CMOS timer to keep itself in check along with the occasional poll of servers (we use specific sets for Canada, US, EU/UK, and other areas).

In a virtual setting one needs to make sure that time sync between host and guest PDCe/DC time authority is disabled. The PDCe needs to get its time from an outside, and accurate, source. The caveat with a virtual DC is that it no longer has a physical connection with the local CMOS clock.

What does this mean? We’ve seen high load standalone and clustered guests have their time skew before our eyes. It’s not as much of a problem in 2012 R2 as it was in 2008 RTM/R2 but time related problems still happen.

This is an older post but outlines our dilemma: MPECS Inc. Blog: Hyper-V: Preparing A High Load VM For Time Skew.

This is the method we use to set up our PDCe as authority and all other DCs as slaves: Hyper-V VM: Set Up PDCe NTP Time Server plus other DC’s time service.

In a cluster setting we _always_ deploy a physical DC as PDCe: MPECS Inc. Blog: Cluster: Why We Always Deploy a Physical DC in a Cluster Setting. The extra cost is 1U and a very minimal server to keep the time and have a starting place if something does go awry.

In higher load settings where time gets skewed scripting the time sync with a time server within the guest DC to happen more frequently means the time server will probably send a Kiss-O-Death packet (blog post). When that happens the PDCe will move on through its list of time servers until there are no more. Then things start breaking and clusters in the Windows world start stalling or failing.

As an FYI: A number of years ago we had a client call us to ask why things were wonky with the time and some services seemed to be offline. To the VMs everything seemed to be in order but their time was whacked as was the cluster node’s time.

After digging in and bringing things back online by correcting the time on the physical DC, the cluster nodes, and the VMs everything was okay.

When the owner asked why things went wonky the only explanation I had was that something in the time source system must have gone bad which subsequently threw everything out on the domain.

They indicated that a number of users had complained about their phone’s time being whacked that morning too. Putting two and two together there must have been a glitch in the time system providing time to our client site and the phone provider’s systems. At least, that’s the closest we could come to a reason for the time mysteriously going out on two disparate systems.

Philip Elder
Microsoft High Availability MVP
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service

No comments: