Tag Archives: raid

What is RAID 100?

RAID 10 is one of the most important and commonly used RAID levels in use today. RAID 10 is, of course, what is known as compound or nested RAID where one RAID level is nested within another. In the case of RAID 10, the “lowest” level of RAID, the one touching the physical drives, is RAID 1. The nomenclature of nested RAID is that the number to the left is the one touching the physical drives and each number to the right is the RAID that touches those arrays.

So RAID 10 is a number of RAID 1 (mirror) sets that are in a RAID 0 (non-parity stripe) set together. There is a certain common terminology sometimes applied, principally championed by HP, to refer to even RAID 1 as simply being a subset of RAID 10 – a RAID 10 array where the RAID 0 length is one. A quirky way to think of RAID 1, to be sure, but it actually makes many discussions and comparative calculations easier and makes sense in a practical way for most storage practitioners. Thinking of RAID 1 as a “special name” for the smallest possible RAID 10 stripe size and allowing, then, all RAID 10 permutations to exist as a calculation continuum makes sense.

Likewise, HP also refers to solitary drives attached to a RAID controller as RAID 0 sets of a stripe of one as well. So the application of that terminology to the RAID 10 world is actually more obvious and sensible when it is looked at in that light. However neither HP nor any other vendor today applies this same naming oddity to other array types such as RAID 5 being a subset of RAID 50 or RAID 6 being a subset of RAID 60 even though they can be thought of that way exactly the same as RAID 1 can be to RAID 10.

If we take that same logic and take it to the next level, figuratively and literally, we can take multiple RAID 10 arrays and stripe them together in another RAID 0. This seems odd but can make sense. The result is a stripe of RAID 10s or, to write it out, a stripe of stripes of mirrors (we generally state RAID from the top down but the nomenclature is from the bottom up.) So as this is RAID 1 on the physical drives, a stripe of those mirrors and then a stripe of those resultant arrays we get RAID 100 (R100.)

RAID 100 is, of course, rare and odd. However one extremely important RAID controller manufacturer utilizes R100 and, subsequently, so does their downstream integration vendor: namely LSI and Dell.

Fortunately because non-parity stripes inject little behavioral oddities and have near zero overhead or latency this approach is really not a problem although it can lead to a great deal of confusion. For all intents and purposes, RAID 100 behaves exactly like RAID 10 when each RAID 10 subset is identical to each other.

In theory, a RAID 100 could be made up of many disparate RAID 10 sets of varying drive types, spindle counts and speeds. In theory a RAID 10 could be made of up disparate RAID 1 sets but this is far more limited in potential or likely variation. RAID 100 could, theoretically, do some pretty bizarre things if left unchecked. In practicality, though, any RAID 100 implementation will likely, as does LSI’s implementation, enforce standardization and require that each RAID 10 subset be as identical as a controller is capable of enforcing. So each will be effectively uniform keeping the overall behavior to be the same as if the same drives were set up as RAID 10.

Because the behavior remains identical to RAID 10 there is an extremely strong tendency to avoid the confusion of calling the array RAID 100 and simply referring to it as RAID 10. This would work fine except for the semi-necessary quirk of needing to be able to specify the geometry of the underlying RAID 10 sets when building a RAID 100. LSI, and therefore Dell, requires that at the time of setting up a RAID 100 set that you must specify the underlying RAID 10 geometry but since the array is labeled as RAID 10, this makes no sense. A bizarre situation indeed.

To further complicate matters, because of the desire to maintain a façade of using RAID 10 rather than RAID 100, proper terminology is eschewed and instead of referring to the underlying RAID 10 members as “RAID 10 arrays” or “RAID 10 subsets” they are simply called “spans.”” Span, however, being a term used for something else in storage that does not apply properly here. Span, in no way, is a proper description for a RAID 10 set under any condition.

But if we agree to use the term span to refer to a RAID 10 subset of a RAID 100 array we can move forward pretty easily. Whenever possible, then, we want as many spans as possible to keep the underlying RAID 10 subsets as small as possible. If we make them small enough they actually collapse into RAID 1 sets (HPE’s odd RAID 10 with a stripe size of one) and our RAID 100 collapses into a RAID 10 with the middle stripe, rather than the outside stripe, being the one that disappears! Bizarre, yes, but practical.

So how do we apply this in real life? Quite easily. In a RAID 100 array we must specify a count of spans to be used. Since we desire that each span contain two physical drive devices so that each span is a simple RAID 1 we simply need to take the total number of drives in our RAID 100 array, which we will call N, and divide that by two. So the desired span count for a normal RAID 100 array is simply N/2. This means if you have a two drive array, you want one span. Four drives, two spans. Six drives, three spans. Twenty four drives, twelve spans. And so on.

Do not be afraid of RAID 100. For normal users it simply requires some additional knowledge of how to select the proper number of spans. It would be ideal if this was calculated automatically and kept hidden allowing end users to think of the arrays in terms of RAID 10. Or else be labeled consistently as RAID 100 to make it clear what the span must represent. Or, of course, simply use RAID 10 instead of RAID 100. But given the practical state of reality, dealing with RAID 100, once it is understood, is easy.

Comparing RAID 10 and RAID 01

These two RAID levels often bring about a tremendous amount of confusion, partially because they are incorrectly used interchangeably and often simply because they are poorly understood.

First, it should be pointed out that either maybe be written with or without the plus sign: RAID 10 is RAID 1+0 and RAID 01 is RAID 0+1. Strangely, RAID 10 is almost never written with the plus and RAID 01 is almost never written without. Storage engineers generally agree that the plus is never used as it is superfluous.

Both of these RAID levels are “compound” levels made from two different, simple RAID types being combined. Both are mirror-based, non-parity compound or nested RAID. Both have essentially identical performance characteristics – nominal overhead and latency with NX read speed and (NX)/2 write speed where N is the number of drives in the array and X is the performance of an individual drive in the array.

What sets the two RAID levels apart is how they handle disk failure. The quick overview is that RAID 10 is extremely safe under nearly all reasonable scenarios. RAID 01, however, rapidly becomes quite risky as the size of the array increases.

In a RAID 10, the loss of any single drive results in the degradation of a single RAID 1 set inside of the RAID 0 stripe. The stripe level sees no degradation, only the one singular RAID 1 mirror does. All other mirrors are unaffected. This means that our only increased risk is that the one single drive is now running without redundancy and has no protection. All other mirrored sets still retain full protection. So our exposure is a single, unprotected drive – much like you would expect in a desktop machine.

Array repair in a degraded RAID 10 is the fastest possible repair scenario. Upon replacing a failed drive, all that happens is that that single mirror is rebuilt – which is a simple copy operation that happens at the RAID 1 level, beneath the RAID 0 stripe. This means that if the overall array is idle the mirroring process can proceed at full speed and the overall array has no idea that this is even happening. A disk to disk mirror is extremely fast, efficient and reliable. This is an ideal recovery scenario. Even if multiple mirrors have degradation simultaneously and are repairing simultaneously there is no additional impact as the rebuilding of one does not impact others. RAID 10 risk and repair impact both scale extremely well.

RAID 01, on the other hand, when it loses a single drive immediately loses an entire RAID 0 stripe. In a typical RAID 01 mirror there are two RAID 0 stripes. This means that half of the entire array has failed. If we are talking about an eight drive RAID 01 array, the failure of a single drive renders four drives instantly inoperable and effectively failed (hardware does not need to be replaced but the data on the drives is out of date and must be rebuilt to be useful.) So from a risk perspective, we can look at it as being a failure of the entire stripe.

What is left after a single disk has failed is nothing but a single, unprotected RAID 0 stripe. This is far more dangerous than the equivalent RAID 10 failure because instead of there being only a single, isolated hard drive at risk there is now a minimum of two disks and potentially many more at risk and each drive exposed to this risk magnifies the risk considerably.

As an example, in the smallest possible RAID 10 or 01 array we have four drives. In RAID 10 if one drive fails, our risk is that its matching partner also fails before we rebuild the array. We are only worried about that one drive, all other drives in the RAID 10 set are still protected and safe. Only this one is of concern. In a RAID 01, when the first drive fails its partner in its RAID 0 set is instantly useless and effectively failed as it is no longer operable in the array. What remains are two drives with no protection running nothing but RAID 0 and so we have the same risk that RAID 10 did, twice. Each drive has the same risk that the one drive did before. This makes our risk, in the best case scenario, much higher.

But for a more dramatic example let us look at a large twenty-four drive RAID 10 and RAID 01 array. Again with RAID 10, if one drive fails all others, except for its one partner, are still protected. The extra size of the array added almost zero additional risk. We still only fear for the failure of that one solitary drive. Contrast that to RAID 01 which would have had one of its RAID 0 arrays fail taking twelve disks out at once with the failure of one leaving the other twelve disks in a RAID 0 without any form of protection. The chances of one of twelve drives failing is significantly higher than the chances of a single drive failing, obviously.

This is not the entire picture. The recovery of the single RAID 10 disk is fast, it is a straight copy operation from one drive to the other. It uses minimal resources and takes only as long as is required for a single drive to read and to write itself in its entirety. RAID 01 is not as lucky. Unlike RAID 10 which rebuilds only a small subset of the entire array, and a subset that does not grow as the array grows – the time to recover a four drive RAID 10 or a forty drive RAID 10 after failure is identical, RAID 01 must rebuild an entire half of the whole parents array. In the case of the four drive array, this is double the rebuild work of the RAID 10 but in the case of the twenty four drive array it is twelve times the rebuild work to be done. So RAID 01 rebuilds take longer to perform while being under significantly more risk during that time.

There is a rather persistent myth that RAID 01 and RAID 10 have different performance characteristics, but they do not. Both use plain striping and mirroring which are effectively zero overhead operations that requires almost no processing overhead. Both get full read performance from every disk device attached to them and each lose half of their write performance to their mirroring operation (assuming two way mirrors which is the only common use of either array type.) There is simply nothing to make RAID 01 or RAID 10 any faster or slower than the other. Both are extremely fast.

Because of the characteristics of the two array types, it is clear that RAID 10 is the only type, of the two, that should ever exist within a single array controller. RAID 01 is unnecessarily dangerous and carries no advantages. They use the same capacity overhead, they have the same performance, they cost the same to implement, but RAID 10 is significantly more reliable.

So why does RAID 01 even exist? Partially it exists out of ignorance or confusion. Many people, implementing their own compound RAID arrays, choose RAID 01 because they have heard the myth that it is faster and, as is generally the case with RAID, do not investigate why it would be faster and forget to look into its reliability and other factors. RAID 01 is truly only implemented on local arrays by mistake.

However, when we take RAID to the network layer, there are new factors to consider and RAID 01 can become important, as can its rare cousin RAID 61. We denote, via Network RAID Notation, where the local and where the network layers of the RAID exist. So in this case we mean RAID 0(1) OR RAID 6(1). The parentheses denote that the RAID 1 mirror, the “highest” portion of the RAID stack, is over a network connection and not on the local RAID controller.

How would this look in RAID 0(1)? If you have two servers, each with a standard RAID 0 array and you want them to be synchronized together to act as a single, reliable array you could use a technology such as DRBD (on Linux) or HAST (on FreeBSD) to create a network RAID 1 array out of the local storage on each server. Obviously this has a lot of performance overhead as the RAID 1 array must be kept in sync over the high latency, low bandwidth LAN connection. RAID 0(1) is the notation for this setup. If each local RAID 0 array was replaced with a more reliable RAID 6 we would write the whole setup as RAID 6(1).

Why do we accept the risk of RAID 01 when it is over a network and not when it is local? This is because of the nature of the network link. In the case of RAID 10, we rely on the low level RAID 1 portion of the RAID stack for protection and the RAID 0 sits on top. If we replicate this on a network level such as RAID 1(0) what we end up with is each host having a single mirror representing only a portion of the data of the array. If anything were to happen to any node in the array or if the network connection was to fail the array would be instantly destroyed and each node would be left with useless, incomplete data. It is the nature of the high risk of node failure and risk at the network connection level that makes RAID decisions in a network setting extremely different. This becomes a complex subject on its own.

Suffice it to say, when working with normal RAID array controllers or with local storage and software RAID, utilize RAID 10 exclusively and never RAID 01.

Dreaded Array Confusion

Dreaded Array Confusion, or DAC, is a term given to a group of RAID array failure types which are effectively impossible to diagnose but are categorized by the commonality that they experience no drive failure in conjunction with complete array failure resulting in total data loss.  It is hypothesized that three key causes result in the majority of DAC:

Software or Firmware Bugs: While dramatic bugs in RAID behavior are rare today, they are always possible, especially with more complicated array types such as parity RAID where reconstructive calculations must be performed on the array.  A bug in RAID software or firmware (depending on if we are talking about software of hardware RAID) could manifest itself in any number of ways including the accidental destruction of the array.  Firmware issues could occur in the drives themselves as well.

Hardware Failure:  Failure in hardware such as processors, memory or controllers can have dramatic effects on a RAID array.  Memory errors especially could easily result in total array loss.  This is thought to be the least common cause of DAC.

Drive Shake: In this scenario individual drives shake loose and disconnect from the backplane and later shake back into place triggering a resilvering event.  If this were to happen with multiple drives during a resilver cycle or if a URE were encountered during a resilver we would see total array loss on parity arrays potentially even without any hardware failure occurring.

Because of the nature of DAC and because it is not an issue with RAID itself but with the support components for it we are left in a very difficult position to attempt to identify or quantify the risk.  No one knows how likely DAC is to happen and while we know that DAC is a more significant threat on parity RAID systems we do not know by how much.  Anecdotal evidence suggests the risk on mirrored RAID is immeasurably low and on parity RAID may rise above background noise in risk analysis.  Of the failure modes, software bugs and drive shake both present much higher risk to systems running on parity RAID because URE risk only impacts parity arrays and the software necessary for parity is far more complex than the software needed for mirroring.  Parity RAID simply is more fragile and carries more types of risks exposing it to DAC in more ways than mirrored RAID is.

Because DAC is a number of possibilities and because it is effectively impossible to identify after it has occurred there is little possible means of any data being collected on it.  Since having identified DAC as a risk many people have come forth, predominantly in the Spiceworks community, to provide anecdotal eye witness accounts of DAC array failures.  The nature of end user IT is that statistics, especially on nebulous concepts like DAC which are not widely known, are not gathered and cannot be.  DAC arises in shops all over the world where a system administrator returns to the office to find a server with all data gone and no hardware having failed.  The data is already lost.  Diagnostics will not likely be run, logs will not exist and even if the issue can be identified to whom would it be reported and even if reported, how do we quantify how often it happens versus how often it does not or how often it might but not be reported.  Sadly all I know is that in having identified and somewhat publicized the risk and its symptoms that suddenly many people came forth acknowledging that they had seen DAC first hand as well and had no idea what had happened.

If my anecdotal studies are any indicator, it would seem that DAC actually poses a sizable risk to parity arrays with failures existing in an appreciable percentage of arrays but the accuracy and size of the cross section of that data collection was tiny.  However, it was original though that DAC was so rare that theoretically you would be unable to find anyone who had ever observed it but this does not appear to be the case.  I already am aware of many people who have experienced it.

We are forced, by the nature of the industry, to accept DAC as a potential risk and list it as an unknown “minor” risk in risk evaluations and be prepared for it but cannot calculate against it.  But knowing that it can be a risk and understanding why it can happen are important in evaluating risk and risk mitigation.

[Anecdotal evidence suggests that DAC is almost always exclusive to hardware RAID implementations of single parity RAID arrays on SCSI controllers.]

RAID Notation Examples

As the new Network RAID Notation Standard (SAM RAID Notation) is a bit complex, I felt that it would be useful to provide a list of common use scenarios and specific implementation examples and how they would be notated.

  • Scenario: Netgear ReadyNAS Pro 2 with XRAID mirror.  Notation: R1
  • Scenario: Two Netgear ReadyNAS Ultra units with local RAID 1 sync’d over the network using rsync.  Notation: R1{1}
  • Scenario: Two Drobo B800fs NAS devices each loaded with single parity RAID sync’d using DroboSync. Notation: R5{1}
  • Scenario: Two Drobo B800fs NAS devices each with dual parity RAID sync’d using DroboSync.  Notation: R6{1}
  • Scenario: Two Linux servers with R6 locally using DRBD Mode A or B (asynchronous.)  Notation: R6[1]
  • Scenario: Two Linux servers with R5 locally using DRBD Mode C (synchronous.)  Notation: R6(1)
  • Scenario: Three node VMware vSphere VSA cluster with local R10.  Notation: R10(1)3
  • Scenario: Windows server with two four disk R0 stripes mirrored.  Notation: 8R01
  • Scenario: Two FreeBSD servers with R10 using HAST with memsync.  Notation: R10[1]
  • Scenario: Two FreeBSD servers with R1 using HAST with sync.  Notation: R1(1)
  • Scenario: Two Windows file servers with R10 using Robocopy to synchronize file systems. Notation: R10{1}
  • Scenario: Single Netgear SC101 SAN* using ZSAN drivers on Windows with two disks. Notation: R(1)

Technology References:

HAST: http://wiki.freebsd.org/HAST

DRBD: http://www.drbd.org/users-guide/s-replication-protocols.html

DroboSync: http://www.drobo.com/solutions/for-business/drobo-sync.php

Rsync: http://rsync.samba.org/

Robocopy: http://technet.microsoft.com/en-us/library/cc733145%28v=ws.10%29.aspx

Notes:

*The Netgear SC101 SAN is interesting in that while it can hold two PATA drives internally and exposes them to the network as block devices, via the ZSAN protocol, through a single Ethernet interface but there is no internal communications between the devices so all mirroring of the array happens in Windows which actually sees each disk as an entirely separate SAN device each with its own IP address.  Windows has no way to know that the two devices are related. The RAID 1 mirroring is handled one hundred percent in software RAID on Windows and the SAN itself is always two independent PATA drives exposed raw to the network.  A very odd, but enlightening device.