Tuesday, 23 March 2010

Server’s Having A Bad Day When … CLOCK_WATCHDOG_TIMEOUT (101)

We lose it almost continually throughout the day.

This particular box is our first Intel SR1630HGP Server System with an Intel Xeon Processor 3460 and 16GB of RAM.

This is what the %windir%\MiniDump directory looks like right now:

image

Our handy Crash Analyzer Wizard that comes with our Software Assurance and MDOP benefits tell us:

image

Click on that Details button and we see:

image

CLOCK_WATCHDOG_TIMEOUT (101)

An expected clock interrupt was not received on a secondary processor in an MP system within the allocated interval. This indicates that the specified processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000019, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff88001e5d180, The PRCB address of the hung processor.
Arg4: 0000000000000002, 0.

Now, it’s not like we have not seen this one before:

We did run the hotfix on the server a while back, but apparently that version was not good enough since we brought all off the server’s server board firmware and the RAID controller’s firmware up to date. We did this just before bringing it back online.

The above screenshot of the MiniDump directory does not do the situation justice as the server has frozen a lot more times than that . . . leading us to look at other options for the setup we need.

But then, apparently the KB975530 hotfix has received a newer version from what we found when searching to find any further info on the problem.

Here is a screenshot of the hotfix directory with our original update and the one that we just received:

image

There was not a lot of time between v3 and v4 with the time of this post being late March.

It now remains to be seen if this fourth iteration of the patch will actually straighten out the timing problem happening with the Nehalem CPUs.

The server:

***

This post has been sitting open for the last couple of hours while busy with other things. After consistently freezing today, the server has been up and running with the two server VMs and five desktop VMs without a hiccup.

Hopefully that will be the situation from now on!

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

No comments: