Showing posts with label RAID. Show all posts
Showing posts with label RAID. Show all posts

Wednesday, 18 December 2013

Repeat After Me: SATA Does Not Belong In Servers Part Deux

For the last number of years we have stopped deploying servers with SATA drives installed.

There are so many reasons why we stopped but here are a few comparisons to SCSI/SAS:

  • SATA does not have the ability to manage a high I/O workload
  • SATA only offers a single inbound and outbound data port while SAS offers dual ports for redundant paths
  • SATA does not have the health monitoring capabilities with SMART certainly not cutting it
  • SATA does not offer anywhere near the capabilities and command set that SAS does for server related tasks, disk redundancy, disk sharing, and so much more

There is a reason why disk manufacturers have tacked on SAS controllers to SATA platter sets. These so-called NearLine drives offer all of the SAS goodness but with SATA capacities.

Here is the first public, that I know of, presentation from Microsoft on the _why_ SATA does not belong in servers.

To quote specifically:

1.Use the per I/O control mechanism that is known as Force Unit Access (FUA). This flag specifies that the drive should write the data to stable media storage before signaling (sic) is finished. Applications that have to do this make sure that data is stable on the disk issue FUA to make sure that data is not lost if a power failure occurs.

Server-class disk drives (SCSI and Fibre Channel) generally support the FUA flag. On commodity drives (ATA, SATA, and USB), FUA might not be honored. (emphasis added) This can potentially leave data in an inconsistent state unless the drive's write cache is disabled. Make sure that the disk subsystem handles FUA correctly if you depend on this mechanism

When listening to a discussion on this the above applies even when SATA disks are used in a properly configured RAID setup whether software (host-based) or hardware RAID on Chip.

In addition, if one were to be setting up a Storage Spaces cluster with multiple paths to the JBOD unit then one would be required to set it up with SAS based SSDs for the high performance storage tier. SATA will work in a single server and single enclosure lab like setting but _not_ in production.

We have had other posts on this topic that outline many other reasons for our decision to drop SATA in servers. The SATA category and the SAS category would be one place to start. :)

Philip Elder
Microsoft MVP
MPECS Inc.
Co-Author: SBS 2008 Blueprint Book

Chef de partie in the SMBKitchen
Find out more at
Third Tier: Enterprise Solutions for Small Business

Thursday, 9 May 2013

Repeat after me: SATA does not belong in servers.

One of the very last servers we deployed with SATA drives had yet another failure in it.

There is a new Intel R2208GZ4GC 2U server in place with eight 600GB 10K SAS drives configured in a RAID 6 array already installed and waiting for tax season to slow down for them (they are an accounting firm).

image

Our client recently moved to a new location with the servers now located in a dedicated room in the basement. The little A/C unit in that room was a leftover from the previous occupant that we were not too sure about.

Well, the hot spare in this server, an Intel Server System SR1560SFHS with three 750GB Seagate ES series SATA drives, died about four months ago. Since the system was slated for replacement we left the remaining two in a RAID 1 array alone.

Well, that ended this morning with one of the drives in the pair having gone full stop. This was probably due to the fact that the temp in the room upon arrival this afternoon was close to 90F.

Someone had fired up the A/C unit without realizing that the hose that puts the heat outside was not connected to the back of the unit. Thus all of the heat it was trying to pull out plus its own heat yielded a very high temperature in that room.

Once the hose was affixed to the back of the unit the temperature started to come down.

So, here we are writing this blog post at 2216Hrs on a Wednesday evening after having logged in to check on the progress of the array rebuild and the above was what we saw.

The RAID controller is an Intel RAID Controller SRCSASRB with battery backup.

SATA does not belong in a server when it comes to spindled hard drives. This experience with the blind failure and the dismal rebuild times, during off hours no less, are definitely a part of it.

SAS/SCSI was designed and engineered to run in server environments. SATA was not.

The firmware tweaks that the hard drive vendors have introduced, along with the pretty much failed NCQ effort, to try and mimic a SAS setup within the SATA controller do not come close to the performance, longevity, and stability that SAS drives offer.

By the way, this goes for NearLine SAS drives as well. These drive types are SATA internals with SAS electronics slapped on to the external of the drive. There is a very good reason why the drives are called "NearLine". :)

The cost on 2.5" 10K SAS drives in 300GB and 600GB sizes have come down quite a bit in the last year. The 900GB 10K SAS drives are still relatively expensive per Gigabyte but provide an opportunity for a large aggregate of storage when needed.

Another way to look at it is this: How many RMA efforts have gone in to server setups with SATA drives in them? Compare that with the servers that have SAS setups. In our case, where we have lots of servers deployed, there is virtually no comparison. Over time the SAS drives have completely trumped the SATA drives in all aspects.

Even with 24x7x365 by 4 hour response times most vendors require time wasted on the phone prior to initiating that on-site visit to replace the failed drive. This time is expensive and to some extent a waste.

Oh, and one more thing: If going with parity in an array go RAID 6 with at least eight 10K spindles and make sure the RAID controller has either flash backed cache or a battery backup.

Storage is almost always the weakest point in a server both for hardware failures and I/O bottlenecks. Kill both. Use a wide array of eight spindles or more and make sure the drives 10K SAS.

The risk when using SATA is just not worth the "savings" IMNSHO (in my not so humble opinion).

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Chef de partie in the SMBKitchen
Find out more at
www.thirdtier.net/enterprise-solutions-for-small-business/

Windows Live Writer

Wednesday, 27 February 2013

HP Smart Array P420i: Post Disk Add To Array Logical Drive Changes

Now that the P420i has finished incorporating four new disks into our HP ML350p G8’s RAID 10 setup we now see the option to extend the existing logical drive on the RAID 10 array:

image

We click on the Extend Logical Drive button and we see:

image

Since we already have fixed VHDX files on the existing logical drive that cover almost 500GB we are not going to shuffle the setup. We will extend the existing partition the full amount available to us.

Once that process has completed we will jump into Windows Server Drive Management and extend the existing OS partition.

When we click the Save button we get this interesting warning:

image

About ten to thirteen years out of date, but possibly applicable somewhere. :)

The process was surprisingly quick.

After hitting Rescan Disks in Disk Management we saw:

image

A quick couple of clicks and we had:

image

Now to rework the necessary VHDX files in Hyper-V Manager after shutting down the VMs.

All in all this process turned out to be a simple one.

Prior to making these changes we made sure to back up our guests just in case the process caused a failure.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Chef de partie in the SMBKitchen
Find out more at
www.thirdtier.net/enterprise-solutions-for-small-business/

Windows Live Writer

Tuesday, 26 February 2013

HP P420i Smart Array: Adding 4 New Disks to a RAID 10 Array

We started off with a new HP ML350p G8 that our now client had purchased.

We had the RAM bumped up to 128GB and then suggested changing the existing 4 disk RAID 10 plus hot spare into a RAID 6 array across 8 disks.

We managed to secure the additional three disks for the task but for now we will hold off on RAID 6 as that requires a "feature activation license" according to HP.

We start by opening the HP System Management Homepage.

image

We then click on the Storage icon.

image

Next up is the Array Configuration Utility (ACU).

image

We then select the P420i Smart Array controller.

image

We then get a list showing any available disks, in this case 3, and the array we plan on expanding.

image

The fourth disk we plan on adding to this array is actually the current Global Hot Spare.

We can't see the hot spare in Logical View so we change to Physical View.

image

Click on the SAS Array A with Spare and we see the following options.

image

Click on Spare Management and we see:

image

We unselected the spare drive and chose Save near the bottom right.

We not longer had a Global Hot Spare.

image

Making sure that SAS Array A is selected we clicked on the Expand Array option.

We clicked the Select All (4 Drives) check and clicked Save.

image

Our status changed immediately:

image

We click on the View Status Alerts link to see what the exclamation mark is all about.

image

So, we have not cache while the RAID array is in the process of changing.

It took a few clicks to figure out where the status report was for the ongoing change. It's under Logical View.

image

We ran a timer for 1% to see what kind of time we were looking at and the result was about a minute and a half. So, about three to four hours for the initial step to be completed.

Hopefully once this process has completed we will be able to expand the existing array. We shall see...

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Chef de partie in the SMBKitchen
Find out more at
www.thirdtier.net/enterprise-solutions-for-small-business/

Windows Live Writer

Tuesday, 19 February 2013

HP Smart Array P420i Controller – Caveat: Need Flash Based Cache to Make Any Online Array Changes

We are in the process of setting up a Windows Server 2012 Hyper-V host with two server operating system guests, one DC and one RDS, and a couple of desktop operating system guests.

We were brought in to set up the environment at the hardware vendor’s request. So, this box was set up prior to our jumping in.

Since we do not work with HP servers on a regular basis we are not up on their terminology nor product features throughout their server lines.

We are in the process of looking into filling up the 2.5” hot swap drive bays as the server came configured with a RAID 10 (RAID 1+0) array and one global hot spare.

Having worked with Intel RAID Controllers most of the time as well as LSI RAID controllers we have come to expect advanced features to work out of the box. This is primarily due to the fact that we run with mainstream series or high performance series RAID controllers out of the box.

We avoid entry level if at all possible.

As we have discovered today, this is one reason why:

image

Note the following:

  • Modular, easy-to-upgrade design lets you optimize performance by upgrading from 40-bit 512MB cache to 72-bit 1GB Flash Backed Write Cache or 72-bit 2GB Flash Backed Write Cache (FBWC).
  • Addition of the flash backed cache upgrade enables array expansion, logical drive extension, RAID migration, and stripe size migration.

Obviously the emphasis is ours.

So, we are now in a position where we need to find out the cost on the 1GB and 2GB FBWC options for this particular server.

We are not sure what that cost will be but it will probably beat out our having to back up the entire server and its guests, that are already set up for production, and restoring the box using our ShadowProtect IT Edition license.

The clue that something was up came via the HP management utility that showed the following for the current RAID array configuration:

image

Note the distinct absence of RAID 6.

Apparently there is a license key that enables RAID 6 in this particular controller.

So, we hit two caveats with this setup:

  • Flash based Cache is required for advanced RAID features.
  • A license pack is required to enable RAID 6.

Ever try to purchase a base model car and add _just_ the features needed like cruise/tilt/air and a decent stereo and end up more expensive that the “LX” version of that vehicle?

It has been a disappointing trend in our industry to no longer be able to configure a Tier 1 server system, desktop system, or workstation system without having to knew the bits and pieces that these systems are made up of. We used to be able to configure systems that would just work as we expect them to and have the features we would expect for the price we paid.

Also, it has been our experience that the Tier 1 sales folks that _know_ these bits and pieces are very hard come-by with the onus being on us to make sure we have our ducks lined up _before_ we initiate contact.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Chef de partie in the SMBKitchen
Find out more at
www.thirdtier.net/enterprise-solutions-for-small-business/

Windows Live Writer

Thursday, 10 January 2013

Video: Running Windows Server 2012 Datacenter with Hyper-V on Intel Server System Dual Xeon E5440 and SATA RAID 6

The following video was taken with the new Samsung ATIV S Windows Phone 8 device. The quality is okay with the phone’s focus routine lacking at times.

We had a client server system that needed its life extended for some lab work that they need to accomplish.

The server configuration:

  • Intel Server Board S5000PSLSATAR
  • Dual Intel Xeon E5440 CPUs
  • (8x 2GB) 16GB of Kingston ECC
  • Intel RAID Controller SRCSASRB
  • (8) 500GB Seagate ES series SATA in RAID 6
  • Intel Server Chassis SC5400LX with 10 hot swap bays.

We had to replace the two backplanes as they were ultimately the cause of system boot problems.

Intel Server System running Windows Server 2012 Datacenter

When it comes to setting up production systems we prefer to use only new hardware for new operating systems.

There are so many ways that things can go sideways on us when we do one of the following:

  • Install a modern operating system on 3+ year old hardware.
  • Install an old operating system (2008 or previous) on new hardware.

In this case we are talking about a lab setup. Exception made.

Note that one of the biggest performance limiters in this system is the RAID 6 SATA array. The RAID controller does not have a battery backup on it so write performance is poor.

Plus SATA is not a good platform to install in a server. Period.

Check out this system’s performance in our 130GB VHDX creation time blog post.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Windows Live Writer

Monday, 3 December 2012

Hyper-V: Disk Queue Length Can Kill Everything

When it comes to configuring an I/O subsystem for a standalone Hyper-V virtualization solution we need to keep in mind that the entire disk subsystem can impact the server's overall performance.

Here is a snip of a fairly high performance portable outfit:

image

Now, keep in mind that the above VMs (3) were running on a Portege Z830 with an OCZ Nocti SSD for system disk and a 480GB Intel 520 Series SSD in an Zalman ZM-VE300 USB 3 external enclosure.

While the throughput is as to be expected on this portable platform at 100MB/Second or thereabouts note the Disk Queue Length for all disks.

Now, take a look at this server based configuration:

image

Note the disk queue length on the system disk: 50!

Now, given that there is a high performance disk subsystem for the VMs we can see that there may actually be a lot more performance for this system to offer if the OS partition was resident on the high I/O setup.

The rule of thumb for Disk Queue Length is:

  • 16 disks in the array then Queue Length should be 8 or less.
  • 24 disks in the array then Queue Length should be 12 or less.
  • # Disks /2 = Reasonable Queue Length

We believe that keeping our configurations balanced across the _entire_ disk subsystem is critical to having the best performance a server can possibly bring to the table.

  • Hardware RAID Controller
    • 512MB or 1GB of Cache
    • Flash Cache or Battery Backup
  • 10K SAS spindles to start.
  • 15K SAS spindles for higher IOPs needs.
  • 7200 RPM SAS spindles can be considered where 16 or more will be installed.
  • Intel 320 Series SSDs for the best IOPs performance.
    • Note that one needs to consider that a full compliment of SSDs can _saturate_ the system bus!

In our case the jury is still out on whether SSD Cache can be of benefit for a standalone solution where there are half a dozen to a dozen VHDX files on our combined storage for the VMs.

Where we have 2TB or more of available storage we configure a 120GB Logical Disk on the RAID controller for our OS and then the balance for our VHDX files with a small 4GB partition for the OS Swap File.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

Windows Live Writer

Monday, 27 June 2011

Promise VTrak– RAID Migration Time To Add Two 300GB 15K SAS Drives To Existing Disk Array

The Promise VTrak E610sD unit that we have been using for our IMS and Intel Server System based Hyper-V failover clustering had eight (8) 300GB 15K Seagate SAS drives configured in one Disk Array.

We added two more 300GB 15K Seagate SAS drives to the VTrak unit to test how a live RAID Migration would impact overall performance of the unit.

image

Event 42: DA_0 – June 24, 2011 1335Hours – RAID migration has started.

We had the VMs on the cluster shut down initially to bring all disk activity as close to zero MB/Second as possible.

When we went through the RAID Migration steps we made a point of preserving the VM’s LUN RAID 10 configuration as it wanted to change the configuration to RAID 1E.

image

Once we clicked Next, Submit, and Confirmed that we wanted the RAID Array Migration to run we saw the following:

image

That 0% sat there _for a long time_.

Meanwhile, with the VMs shut down we saw:

image

Based on that 28MB/Second number we figured that the RAID Array Migration process was going to take a while.

Well, it most certainly did:

image

Event 46: DA_0 – June 25, 2011 1159Hours – RAID migration has completed.

The process took around 22.5 Hours!

Now, we did fire up the four VMs running on the Hyper-V cluster not long after the above performance graphs snip was taken. So, we had SBS, SQL with LoB, and two Windows 7 desktop VMs running in production mode while the migration was happening.

We did some performance testing in the LoB as we had already been running some baseline performance tests for the cluster setting and saw very little if any impact on its performance.

The LoB is SQL, IIS, and .NET intensive.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Saturday, 21 May 2011

Configuring An Intel RAID Controller Via RMM JViewer

We are in the process of setting up a new server remotely.

We are logged in via the Intel Remote Management Module 3 that is installed on the Intel Server Board S3420GPLX.

In order to get the mouse to a liveable control level we had to close the KVM session and set the mouse configuration to RELATIVE:

image

Out of the box the mouse is set to ABSOLUTE mode which does not work very well at all.

Neither the Soft Keyboard nor the system’s keyboard that we are using allow for ALT+Key presses. The Keyboard menu at the top has some control over the ALT and CTRL key presses and holds but still it was not too reliable. It just worked to allow us to choose the drives for the disk group.

As long as we remained patient and moved the mouse about in a slow and fluid manner, controlling the RAID BIOS worked . . . just barely.

image

And finally we had our array:

image

So, it just takes a bit of patience to work through the process of setting up an array or arrays via the Intel Remote Management Console.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Thursday, 28 April 2011

SAS versus SATA and Hardware RAID versus Software RAID

In the last few years we have made some changes to the server configurations that we either build and deploy or Tier 1 provides:

  • We install SAS 10K or SAS 15K drives over SATA.
    • Performance for one is vastly superior on the SAS drives.
    • SAS drives are a lot more sensitive to bad sector behaviour and are better able to recover from bad data being tossed up.
    • SAS drives use many forms of ECC which protects the integrity of the data.
  • We install a hardware RAID controller with battery backup or SSD Cache over using the on board software RAID.
    • The on board “RAID” is software driven. All RAID calculations are completed by the server’s CPU and in many cases require the software driver to rebuild or function properly – meaning we need to boot into the OS.
    • We have had difficulties with on board software RAID recoveries and ShadowProtect due to driver issues.
    • Hardware RAID on Chip with the battery backup or SSD Cache virtually eliminates the parity write cost of RAID 5 and greatly reduces that cost for RAID 6. SSD Cache almost renders the whole discussion moot as the most frequently requested data sits on the SSD.
    • Hardware RAID controllers have the ability to mitigate the failure of a drive by keeping the server up. In our experience software RAID tends to freeze the box if a drive fails.
    • Hardware RAID controllers tied to SAS drives have a much better chance of mitigating or eliminating the possibility of data corruption if sectors on an array member are dying.

Intel has a couple of resources for the above discussion as do many other RAID related vendors.

One Intel resource is the following Intel support page: Intel Server Products: Choosing between SAS vs. SATA Hard Disk for your Server RAID System.

There we find this grid that gives us a pretty good idea of some of the significant differences between SAS and SATA drives:

image

We are given an extensive explanation in the following document which is linked to below the above table: Intel – Enterprise-class versus Desktop-class Hard Drives (Link to PDF document download).

We put our server configurations through a lot of testing before we deploy them to client sites or within our own organization. We do this because we want to make absolutely sure that the server configuration we are going to deploy will meet the needs of our client over the life of the box which is about 36 months.

The extra cost for the hardware RAID controller, battery backup, and 15K SAS drives (not much of a cost difference between 300GB 15K SAS and Seagate Enterprise SATA these days) when taken over the life of the box (divide that cost by 36) is actually quite small relative to the performance, data protection, and overall storage stability benefits.

Tier 1 Caveat

Keep in mind that cost inferences mentioned here are for our in-house Intel server solution components. When it comes to Tier 1 the costs of some server components and server storage can be extremely high relative to components supplied in our solutions.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Thursday, 14 April 2011

HP ProLiant MicroServer – Rebuilding An Array

Okay, so while installing the RAC one of the primary OS drive cradles must have been nudged enough to either disconnect the drive or the tray was not fully seated in the first place.

Either way we ended up with a Critical error on the OS’s RAID 1 array.

The BIOS for the “RAID” at the motherboard level holds no clues to how to rebuild the array once we re-seated the drive.

We booted the server since our experience with software based on board RAID chipset failures require the driver to be loaded and available before a rebuild could be initiated (some Intel on board RAID chipsets are an exception to this rule).

Once into the OS we found the AMD RAIDXpert “console” link:

image

What came up was a Web based console that asked for a username and password. Fortunately the default username and password was right there on the logon page.

Once in we were able to initiate a rebuild of the RAID 1 array:

image

The rebuild time for a pair of 750GB Seagate ES series drives took quite a few hours on this particular box even though there was not a lot of disk action going on via the network.

The Performance tab in the Task Manager during the rebuild process looked like this:

image

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Saturday, 18 September 2010

Western Digital RE3 Series SATA Drives In Intel Hot Swap Expander Backplanes May Error Or Go Offline

It seems that there is an issue between the Western Digital RE3 SATA drives running at 3 Gb/s in Intel’s hot swap expander backplanes.

Root Cause
The SAS expander chip on Intel® Server Drive Enclosure AXX6DRV3GEXP and AXX4DRV3GEXP experience an undesired flow-control behavior when connected to a SATA disk drive running at 3.0 Gb/s. Drives operating at 3.0 Gb/s may not receive proper data or SATA flow control and may result in a timeout error. There is no timeout at 1.5 Gb/s. This interoperability limitation of the SAS expander with SATA disk drive is currently seen on Western Digital’s RE3 series SATA drives.

Products affected:

  • Intel Server Drive Enclosure AXX6DRV3GEXP
    • 6 drive expander backplane.
  • Intel Server Drive Enclosure AXX4DRV3GEXP
    • 4 drive expander backplane.

To mitigate the issue the drive needs to be set to run at 1.5 Gbp/s by setting the jumper at position 5/6.

image

It is also important to make sure that the hot swap backplane has the most current firmware.

The hot swap backplane firmware download link can be found on the server chassis support page.

SATA vs. SAS

As a rule we no longer use SATA drives in any of our server configurations.

With the cost of SAS drives having come down quite substantially over the last few years along with the huge performance benefits we get with using SAS drives there is really no reason to use SATA drives in a server setting anymore.

The exception to this rule is for Intel Solid-State Drives that run with a SATA interface.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Tuesday, 7 September 2010

On Board Software “RAID” No More

We stopped using the on board “RAID” feature of on all of our client servers.

Why?

Because the cost of a high performance 4 port RAID controller like the Intel RAID Controller RS2BL040 is under $450CA.

For $450 we get:

  1. True RAID on Chip driven calculations.
    • No software interpreting the RAID calculations on the CPU.
  2. Drive redundancy.
    • But, on board RAID is just that: RAID right? Well, no. If a drive fails in an on board based “RAID” array 9 times out of 10 the server is locked up.
    • If a drive fails on a true RAID controller based array the RAID controller marks the drive dead, logs the failure, but _keeps moving along_.
  3. Data Protection
    • We have seen more failed OS loads after an on board based “RAID” array member failure by a factor of at least 10 compared to either a member or RAID controller failure in server settings.
  4. Performance
    • On board RAID relies on the software drivers to do all of the calculations. They run on top of the CPU. There is a performance hit for this.
    • Add a battery backup for the RAID on Chip’s memory cache and the array performance steps up accordingly.
    • Array rebuild times to hot spare for the RAID on Chip solution will be superior to on board software “RAID”.
      • A failed drive is when the server is most vulnerable.
  5. Maintenance
    • RAID on Chip solutions offer true hot swap compatibility.
    • Some software “RAID” setups may offer the same. But, test it first.

Given the overall benefits of an add-in RAID solution, in our opinion, no server should go out the door with an on board software “RAID” setup.

It is just not worth the risk.

Adaptec also makes some pretty good RAID on Chip solutions.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Saturday, 31 July 2010

Dell RAID Array Drive Failure

We had all of the Dell management utilities configured when the box was put into production which meant that the e-mail alerts that were set up were tested good.

For some reason they did not fire when a drive in one of the Dell server’s RAID arrays failed. The RAID controller is a PERC 6/i which has no audible warning on it so no one in the office beside the server closet knew anything was up.

Since this is a remote server we have contacted our client with the news and will work with them and Dell to get that drive swapped out as soon as possible since the array in jeopardy contains their data.

image

We will spend some time trying to figure out why OpenManage failed to send an e-mail as well.

The lack of an audible warning on Dell configured LSi RAID controllers has been mentioned here before.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Tuesday, 6 April 2010

Intel Modular Server – Tested Hardware and Operating System List – Tested External Storage

For any Intel product, the Tested Hardware and Operating System List (THOL) is the de facto point of reference for component and product compatibility.

If there is a conflict or problem in an Intel based system and the problematic component is _not_ on the THOL, then Intel Support will not be applicable.

Some results from the above search:

image

To find the latest THOL for the Intel Modular Server quickly:

Notice that the IMS THOL is the first link:

image

The current reason we are digging into the IMS THOL is to verify how we can extend the 14 drive internal storage unit either via the external SAS ports on the Intel Storage Controllers or via iSCSI target to a SAN.

We are looking for additional storage that is also redundant for any cluster based VMs.

The THOL tells us that we have a few options:

image

Note the Promise e310sD is a dual controller 2U unit that has 12 drive bays and the e610sD is a dual controller 3U unit that has 16 drive bays.

Intel’s own Storage Server SSR212MC2R is currently the only iSCSI target that is on the above approved list.

The Promise vTrak storage units are quite reasonably priced relative to the additional storage that we are able to access along with the ability to utilize a dual storage path (Multi-Path I/O or MPIO) to ensure storage access redundancy.

This is a screenshot of a page out of the VTrak Ex10s supported configurations with Intel Modular Server (PDF) document:

image

So, we have potentially found what we are looking for. Now we need to discover where the IMS specific Promise documents are located as the links within some of the documents are not working at this time.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Wednesday, 24 March 2010

The Difference Between OS Write-Back Caching and Hardware RAID Write-Back Caching

The biggest performance increase due to Write-Back cache (Write-Cache) is an add-in or on board RAID controller's ability to fetch data from memory that is being frequently read and written to.

The biggest drawback to that is the fact that the memory on the controller is volatile meaning that no power = no more data.

Add a battery backup to the controller and we now have a different story. Most controllers that have that option will enable Write-Back when the battery is installed and _fully_ charged ... and only then. There is a setting to force it enabled, but given the circumstances this is obviously a dangerous proposition.

There is also a form of Write-Back cache in the Windows OS. Witness the tray icon that we use to "eject" a USB drive connected to the system in order to properly remove it.

image

This is for the OS's Write-Back cache and _not_ the hardware RAID controller's Write-Back cache.

Hardware driven RAID 5 only makes sense when the RAID controller has Write-Back enabled and a battery backup in place. Otherwise, those parity bits really kill write performance. Double that hit for RAID 6. RAID 50 and 60 make up for that to some degree, but by that point RAID 10 is the way to go in most cases.

Adaptec is utilizing Intel's Extreme series 32GB SSD drives in a Write-Back Cache product they call MaxIQ.

Since SSD is non-volatile, meaning power loss does _not_ equal data loss, the MaxIQ solution makes a lot of sense. Performance improvements for Adaptec RAID cards to which a MaxIQ solution is attached to see _huge_ performance benefits. This improvement is very beneficial where the extra cost is made up for in the improved I/O performance hand over fist.

BTW, Adaptec Z series RAID also have non-volatile cache memory installed by default, so no need for a battery backup unit. Note that the Z series cache is not the same as the MaxIQ solution.

Just so we are clear, Write-Back in the RAID controller is not the same as Write-Back in the OS. And, by default, server OSs have their own Write-Back disabled. An OS cannot disable the RAID controller's Write-Back since this is a BIOS/firmware level setting done either directly in the RAID controller's BIOS console or via some sort of management software.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Wednesday, 16 December 2009

Intel Hot Swap 2.5” to 3.5” Adapter Tray Enables Intel SSD Hot Swap RAID – AXX25DRVADPTR

We had a really difficult time getting our hands on these adapters:

image

The adapter is actually upside down in the above image.

Flipped over, the SATA and power connectors on the 2.5” drive line up perfectly with the hot swap backplane connector to allow for the smaller drive to be used in 3.5” drive hot swap backplanes.

With these trays we can look at configuring a RAID 1 array or RAID 10 array of SSDs for I/O intensive needs for things such as OS or Exchange databases.

We can also offer an option to update some of our existing client servers with these trays and a couple of Intel SSDs to speed things up significantly.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Wednesday, 2 December 2009

Intel RAID Controller RAID Smart Battery Backup Installation Caveat

Most Intel add-in RAID controllers have an option to add a battery backup unit for the onboard cache memory.

When the battery is installed, the RAID on Chip processor has the ability to cache data in memory and work on it on the fly while in cache RAM instead of passing all data plus parity bits through to the disks.

A list of the battery backup units and the RAID controller(s) they are compatible with can be found here:

Once the battery backup is installed on the RAID controller or connected to the RAID controller if the battery is remote to it, the actual battery charging process will not run until the server has booted up.

Simply plugging the server in after installing the battery backup will not initiate a charge.

So, for existing servers where the battery backup has been added in, the ability to enable Write Back Mode with BBU Present in the RAID controller’s BIOS will be available, but Write Back Mode will not happen until the battery is charged.

The server will need to be booted up and either idled or tested for at least 24 hours before the battery will be fully charged. Once charged, a reboot will allow the RAID controller to initiate Write Back Mode. Charge status is indicated in the RAID controller’s BIOS information screen during the boot process. It may also be indicated in the RAID Web Console.

For new servers, the point is fairly moot since the battery can charge while the server is being burned in. The final RAID configuration would be set after the burn-in period along with the Write Back Mode setting.

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer

Thursday, 5 November 2009

So, How Responsible Are We For Our Client's Networks?

The last new drive is in and the array is now in rebuild mode. So far, there are no errors coming up.

There is about 3 hours give or take for this rebuild process to complete.

These types of situations really bring home the depth and the weight of our responsibility for our client's livelihoods. If this box dies before the rebuild completes, we are ready to Swing and recover. The Swing could be done before they get in tomorrow morning and thus not cost them anything in lost productivity.

Do we really, and I mean _really_, know the value of our client's business in terms of dollars per hour?

If the hardware quit anytime during business hours or before we had the chance to prepare our disaster recovery plans the costs to our client would have been huge.

It would take us at least an hour or two to fail them over to the second server and sync the data on it from the ShadowProtect backup. There would also be lost time due to their client files that would not be up to speed.

Anyone hazard a guess as to the productivity costs per hour for this 15 seat accounting firm?

How about a thousand dollars per hour? How about two thousand per hour or even more since most of the folks in this firm are CGAs and up?

When this type of situation is at hand, getting to sleep at night can be very difficult. Yes, in this case we are now prepared. But, the possible impact to our client's business is not an easy thing to pass off as superfluous and thus weighs heavy on the back of the mind.

In this day and age where small businesses are totally reliant on their technology we truly do hold our client's livelihoods in our hands.

Philip

Sent from my SBS Integrated Windows Mobile® phone.
--
ExchangeDefender Message Security provided by MPECS Inc: Click below to verify authenticity
http://www.exchangedefender.com/verify.asp?id=nA61Gt4p016106&from=philipelder@mpecsinc.ca

RAID Controller Log: Unrecoverable Medium Error – Puncturing Bad Block?!?

Okay, so this is a new one and lead to a near stoppage of the heart last evening:

image

The errors in order:

Controller ID: 0 Unrecoverable medium error during rebuild: PD –|—:0 Location 0x26f1640

Controller ID: 0 Puncturing bad block: PD –|—:0 Location 0x26f1640

Controller ID: 0 Puncturing bad block: PD –|—:1 Location 0x26f1640

PD 0 is the last original disk in this server that is giving us headaches. Both PD 1 and PD 2 (there are three hot swap bays in the SR1560SFH Intel Server System) were replaced with new drives.

The original PD 1 had failed during a server firmware including BMC update (previous blog post). The original PD 2, the then global hot spare, was rebuilt into the array with no errors . . . until a consistency check that ran later that afternoon produced some unrecoverable fatal errors.

Last night we dropped original PD 1 out of the configuration, replaced it with a new drive, had the new drive picked up as a hot spare. We then failed out the original hot spare PD 2 now RAID 1 array member assuming that it was the source of the errors we saw in the consistency check yesterday afternoon.

So, the above screenshot was taken after the new PD 1 was being rebuilt into the array with PD 0 as the source. Needless to say the heart definitely skipped a few beats with visions of index $0 running through my head (previous blog post)!

The rebuild did eventually finish successfully though?!?

We will be going back this evening to fail out the bad PD 0 and replace it with a new drive which will then be designated the new hot spare.

Once the PD 2, currently a hot spare, rebuild into the RAID 1 array has finished, Intel indicated to us that we need to run a consistency check. From there, hopefully ShadowProtect will finally give us a backup!

And one more thing, just what does “Puncturing bad block” really mean?

The suggestion in the above NEC linked document is to take the preventative measure and swap out the indicated drive(s) promptly. :)

It looks as though the RAID controller has found some bad sectors on the PD 0 and puncturing means to set those sectors as off limits on both array members.

But part of this whole puzzle is the fact that the RAID controller (Intel SRCSASRB with firmware 470) shows a media error level of 0 for both array members and a predictive failure count of 0 for both members!

Hopefully tomorrow we can rest easy with a backup in hand!

Philip Elder
MPECS Inc.
Microsoft Small Business Specialists
Co-Author: SBS 2008 Blueprint Book

*Our original iMac was stolen (previous blog post). We now have a new MacBook Pro courtesy of Vlad Mazek, owner of OWN.

Windows Live Writer