Wednesday, 2 December 2009

Hyper-V CLOCK_WATCHDOG_TIMEOUT Error Within 24 Hours On Fresh H-V 2K8 R2

We ran a test on one of our freshly installed Hyper-V Server 2008 R2 servers to see if we ran into this issue.

  • Intel Server System SR1630HGP 1U
  • Intel Server Board S3420GPLC integrated
  • Intel Xeon Processor X3450
  • 16GB Kingston ECC
  • Intel RS2BL080 6Gbps SAS 2nd Generation PCI-E
  • 3x 450GB Seagate 15K RPM SAS in RAID 5
  • Intel RAID Smart Battery for the RAID controller

The server has only been up and running about 24 hours. We have been running some stress tests along with some test VM installs and backup restores as Hyper-V guests.

This is the Crash Analyzer Wizard’s results (DaRT MDOP product):

image

So, after a relatively short period of use the problem has reared its head.

It looks as though the hotfix is an absolute must for any Technician’s Thumb Drive and a required install on any server that will utilize Hyper-V on Intel Xeon Processor 3400 and 5500 series CPUs!

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

7 comments:

stryqx said...

Weird. Got an HP ML350 G6 with 2 x E5520 and Win2008 R2 Std Core on it - no hotfix. It's been doing burn-in for a week without problems.
Need to check the C1E status in the BIOS, as this is mentioned in Virtual PC Guy's response to this comment

Philip Elder Cluster MVP said...

Chris,

Both C3 and C6 are enabled in the BIOS on this box. Those settings are the ones that make or break the setup.

What's the point of disabling them if we then lose the advanced Turbo Boost and SpeedStep abilities of the CPUs? For now, I hope the hotfix does the trick.

Philip

stryqx said...

I've checked the BIOS on the HP ML350 G6 systems. Both have C6 set as the minimum processor idle power state. So I should be seeing BSODs but I don't.

WRT the 5500 series processor errata, disabling the processor C6 state should suffice. The problem at the OS layer is working out if it's ACPI C2 or ACPI C3 that needs disabling if the BIOS doesn't allow for individual processor C-state masking. C3 is still classified as an inactive core for the purposes of Turbo Boost, but the amount of boost is still controlled by power and thermal loads.

Going to have to use TMonitor on these boxes with and without C6 enabled methinks.

Still puzzled as to why I've got functional systems though.

Anonymous said...

Hello,

Dell R210 with X3450,16 GB mem,2TB SASI hdds with H200 RAID 10(be 1TB Dell VHD) and Win 2008 R2 Std Core also got 0x00000101 STOP on every Hyper-V 2.0 networking process which VHD file copied from other Win 2008 Hyper-V 1.0.
I'm trying to find out successible combination in BIOS Power Management of CPU setting and one/two NIC connection of BroadCom NetXtream IIs.
It takes more days.

Philip Elder Cluster MVP said...

A,

Make sure you have v4 of the Watchdog Timer update via the Hotfix link in the right hand column.

With v3, which is the one we had for this post, we still experienced the problems.

Since applying v4 of the update we have not seen any of our H-V servers spontaneously combust since!

Philip

Philip Elder Cluster MVP said...

Quote: Yusuke Naito said...
Philip,

I'm former anonymous. I downloaded and tried to apply the v4 hotfix. However it said "Can't apply your processor". Maybe this hotfix program select only X5500 CPU. I'm using X3450.

April 16, 2010 8:54 AM

The hotfix applies to both Xeon Processor 5500/5600 and 3400 Series CPUs.

Philip

yusuke naito said...

Philip,

I could set the v4 hotfix and got normal H-v networking. I misunderstood registry setting as not inevitable by BIOS power option.
This blog information was much appriciated.