Wednesday 2 December 2009

Hyper-V On Nehalem CPUs Error – CLOCK_WATCHDOG_TIMEOUT Critical Hotfix

Via Slashdot:

The Microsoft Knowledgebase article with the hotfix:

We are just in the process of implementing Hyper-V Server 2008 R2 on the new Intel Nehalem architecture in both the Intel Xeon Processor 3400 and 5500 series.

So, finding out about this particular problem before the boxes went into production is a good thing.

The update begs the question though, does the hotfix really resolve the conflict with the CPU architecture?

It looks as though we will be doing a bit more testing than we normally do on new platforms to make sure that things do not blow up on us with the new soon to be deployed production Hyper-V boxes.

One of the linked articles from Slashdot is a good read:

UPDATE: Here it is:

image

As a test, we did not install the update on a freshly installed Hyper-V Server 2008 R2 box and the above happened about midway through our second VM setup.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

2 comments:

Anonymous said...

Hi Philip,

"The update begs the question though, does the hotfix really resolve the conflict with the CPU architecture?"

I have indeed exactly the same question, considering i'm in the proces of buying ourselves two servers (with Nehalems) and a san (to setup a hyper-v failover cluster on Windows Server 2008 R2). The fact that this issue seems to be intermittent and i've read conflicting reports about this, worries me also. So i'm curious about your evaluation whether or not this hotfix does what it promises.

Sven Willemen

Philip Elder Cluster MVP said...

Sven,

After the hotfix was applied we had a couple of weird VM service lockups but the OS kept running.

Once through that though, we have stood up a number of recovery based OSs as well as fresh OS installs and had them running contantly with no more issues.

The C3 state and other BIOS settings related to the Nehalem Turbo and Sleep states are fully enabled.

We are standing up an Intel Flex Server (Modular) this week to run a Hyper-V cluster on. So, you can bet that there will be a lot of testing going on before we go live!

Philip