Wednesday, 16 November 2011

Hyper-V Hanging at Shutting Down Cluster Service … Troubleshooting Steps

We are in the process of troubleshooting a problematic shutdown of a Hyper-V Server 2008 R2 SP1 cluster node.

For now, we have discovered how to kill the cluster service via command line from another node to at least get the node to boot:

  • taskkill /S HyperV-Node01 /IM clussvc.exe

Prior to doing that we ran the following command on the good node:

  • cluster.exe log /g

The log that is created can be found in C:\Windows\Cluster\Reports and is named Cluster.log.

So far, nothing in the logs made any clear indication as far as the cause.

However, after using the Taskkill command which allowed the server to shut down and reboot we initiated another restart on the node after logging in.

This time the cluster service shut down cleanly with the node doing a normal reboot cycle.

We then moved all of the VMs, CSVs, and Cluster ownership over to that node and ran a shutdown cycle on the other.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

1 comment:

MB said...

Hi Philip,

Hope you get somewhere with this one, we're finding the same issue with 2 pre SP1 clusters.

There are 6 and 9 servers in each cluster running hyper-v with assorted guests and rebooting is a lottery. We have tried shutting down all the guests and rebooting each host individually, allowing each one to come all the way back up, and also taking all the cluster disks offline and rebooting each host individually and each time the third server generally hangs on cluster service.

Is there a correct way to reboot all the hosts ie when service packing and updating? I get cold sweats when I have to reboot these servers!

As you've found, you can reboot the same server again and have no issue, but I think thats because quorum breaks. Once quorum is broken it doesn't seem to care anymore and cluster service shuts down ok.

We've even had it where a cluster server can see all the other servers ok but can't participate. To get it back we've had success in rebooting all servers again, and on one occasion removing cluster services and re-adding. This may be because of patching levels though!