Monday, 24 July 2017

Intel JBOD2224S2DP - Troubleshooting Redundant Path Fail

We have an Intel JBOD2224S2DP that has seemingly dropped one of its expanders as we are seeing a MPIO path error on both nodes in a Hyper-V/Storage Spaces cluster (2x nodes + 2x JBODs).

First step is to get the SAS IDs for the expanders by pulling the cover:



With IDs in-hand the next step is to figure out which one has failed.

We do this by downloading the latest firmware for the JBOD and copying the contents to a \TMP folder on the server or server node.

Open an elevated CMD on the server/node and:

C:
CD \TMP\Windows [ENTER]
cmdtool2_64 -adpsetprop ExposeEnclDevicesEnbl 1 -aall [ENTER]
xflash -I get avail [ENTER]

And, voila! We have our culprit:


The problematic expander is the one on the right.

The final step to run on the server/node:
cmdtool2_64 -adpsetprop ExposeEnclDevicesEnbl 0 -aall [ENTER]

Now, off to call Intel to see about a warranty replacement or to find one out there somewhere. ;)

UPDATE 2017-08-16: As it turns out, we replaced the seemingly problematic expander and still had the error. After swapping the RS25GB008 HBA pair between nodes the problem followed the HBAs. After a bit more testing we found that one of the RS25GB008 HBAs had a bad port.

Since Intel no longer supports them and distribution didn't have any in the channel we had to go out and find some via the regular channels. They just arrived the other day and we now have MPIO on both systems without an error.

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service
Twitter: @MPECSInc

Thursday, 20 July 2017

Windows Server 2016 July 18, 2017 CU is Important!

The July 18, 2017-KB4025334 (OS Build 14393.1532) Update is _important_!

There are fixes in there for a lot of cluster specific products.
  • iSCSI
  • S2D
  • ReFS
  • DeDup
  • MPIO
  • NTFS
The specifics are in the Microsoft page linked to above as is a download link.

We are in the process of updating our base Install.WIM image (blog post) with this update as I write this!


Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service
Twitter: @MPECSInc

Monday, 17 July 2017

Windows 10: Installing on Intel Desktop Board DX79SR

Boy, did we get a lot of grief trying to get Windows 10 to install on an Intel Desktop Board DX79SR based system!

2017-07-14 MIB - 01 Windows 10 Disk Install ERROR

Windows Setup

Windows cannot be installed to this disk. This computer's hardware may not support booting to this disk. Ensure that the disk's controller is enabled in the computer's BIOS menu.

Some pointers:

Neither post is available for comment thus this blog post plus a new discussion on the Intel Communities site: Windows 10 on Intel Desktop Board DX79SR.

What finally worked:

  1. Set up RAID in RSTe (CTRL+I)
  2. Set BIOS Boot Mode to UEFI
  3. Plug in ISO mount type enclosure with Win10 ISO mounted (we use StarTech S2510BU3ISO)
  4. NOTE: I had to use the USB2 ports as the USB3 ports did not power the enclosure during boot
  5. F10 during POST
  6. Choose DVDROM - UEFI (name may vary)
  7. Click through and choose ADVANCED Setup
  8. Click on the RAID array logical disk for the OS
    1. NOTE: If any MBR partitions exist they need to be cleaned prior to this step
    2. Use DiskPart in Repair --> CMD
  9. Click NEXT

That should do it!

WP_20170717_12_39_27_Pro_LI

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service
Twitter: @MPECSInc

Thursday, 13 July 2017

Mellanox PPC SwitchX Update v3.6.4006

Mellanox has released a firmware update for their SwitchX switches: v3.6.4006.

We've already updated our two SX1012 switches to v3.6.3508 as per our blog post Mellanox Prep for RoCE RDMA. That means that we'll be able to upgrade without any intermediary steps as per the section Upgrade From Previous Versions.

When looking into the Release Notes for the new firmware version we see:


Note that in our case we are running ConnectX-3 Pro ( MCX354A) adapters. So, we'll be keeping firmware 2.4.5030 on those NICs until such time as Mellanox lets us know that we are able to bump them up to 2.4.7000.


Looking in the Changes and New Features section there doesn't seem to be anything specific to us however there are quite a few items listed for versions between v3.6.3508 and v3.6.4006!

There are a few items in the General Known Issues section that we need to be aware of.
  • Point 32: Statistics files are reset which means graphs get reset.
  • Point 49 indicates that a faulty cable may cause other ports to delay their "rise". 
  • Point 50 is important. 40GbE passive copper cables 5m in length may experience "rise" issues if connected to a third party 40GbE NIC.
  • Point 93: Break-out Cables
    • Odd ports might suffer from Tx drops even when global flow control is enabled.
      Set the egress poll to 8M using the following command:
      “pool ePool0 direction egress-mc size 8M type dynamic”.
  •  Point 128: QoS: ETS does not work on SN2100 switch system.
I suggest checking out the Bug Fixes section near the end of the document. ;)

Philip Elder
Microsoft High Availability MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book
Our Cloud Service
Twitter: @MPECSInc