Tag Archives: raid

Network RAID Notation Standard (SAM RAID Notation)

As the RAID landscape becomes more complex with the emergence of network RAID there is an important need for a more complex and concise notation system for RAID levels involving a network component.

Traditional RAID comes in single digit notation and the available levels are 0, 1, 2, 3, 4, 5, 6, 7.  Level 7 is unofficial but widely accepted as triple parity RAID (the natural extension of RAID 5 and RAID 6) and RAID 2 and RAID 3 are effectively disused today.

Nested RAID, one RAID level within another, is handled by putting single digit RAID levels together such as RAID 10, 50, 61, 100, etc.  These can alternatively be written with a plus sign separating the levels like RAID 1+0, 5+0, 6+1, 1+0+0, etc.

There are two major issues with this notation system, beyond the obvious issue that not all RAID types or extensions are accounted for by the single digit system with many of the aspects of proprietary RAID systems such as ZRAID, XRAID and BeyondRAID being unaccounted for in the notation system.  The first is a lack of network RAID notation and the second is a lack of specific denotation of intra-RAID configuration.

Network RAID comes in two key types, synchronous and asynchronous.  Synchronous network RAID operates effectively identically to its non-networked counterpart.  Asynchronous functions the same but brings extra risks as data may not be synchronized across devices at the time of a device failure.  So the differences between the two need to be visible in the notation.

Synchronous RAID should be denoted with parenthesis.  So two local RAID 10 systems mirrored over the network (a la DRBD) would be denoted RAID 10(1).  The effective RAID level for risk and capacity calculations would be the same as any RAID 101 but this informs all parties at a glance that the mirror is over a network.

Asynchronous RAID should be denoted with brackets.  So two local RAID 10 systems mirrored over the network asynchronously would be denoted as RAID 10[1] making it clear that there is a risky delay in the system.

There is an additional need for a different type of replication at a higher, filesystem level (a la rsync) that, while not truly related to RAID, provides a similar function for cold data and is often used in RAID discussions and I believe that storage engineers need the ability to quite denote this as well.  This asynchronous file-system level replication can be denoted by braces.  Only one notation is needed as file-system level replication is always asynchronous.  So as an example, two RAID 6 arrays synced automatically with a block-differential file system replication system would be denoted as RAID 6{1}.

To further simplify RAID notation and to shorten the obvious need to write the word “RAID” repeatedly as well as to remove ourselves from the traditional distractions of what the acronym stands for so that we can focus on the relevant replication aspects of it, a simple “R” prefix should be used.  So RAID 10 would simply be R10.  Or a purely networked mirror might be R(1).

This leaves one major aspect of RAID notation to address and that is the size of each component of the array.  Often this is implied but some RAID levels, especially those that are nested, can have complexities missed by traditional notation.  Knowing the total number of drives in an array does not always denote the setup of a specific array.  For example a 24 drive R10 is assumed to be twelve pairs of mirrors in a R0 stripe.  But it could be eight sets of triple mirrors in a R0 stripe.  Or it could even be six quad mirrors.  Or four sext mirrors.  Or three oct mirrors.  Or two dodeca mirrors.  While most of these are extremely unlikely, there is a need to notate it.  For the set size we use a superscript number to denote the size of that set.  Generally this is only needed for one aspect of the array, not all, as others can be derived, but when in down it can be denoted explicitly.

So an R10 array using three-way mirror sets would be R130.  Lacking the ability to write a superscript you could also write it as R1^3+0.  This notation does not state the complete array size, only its configuration type.  If all possible superscripts are included a full array size can be calculated using nothing more.  If we have an R10 of four sets of three-way mirrors we could write it R1304 which would inform us that the entire array consists of twelve drives – or in the alternate notation R1^3+0^4.

Superscript notation of sets is only necessary when non-obvious.  R10 with no other notation implies that the R1 component is mirror pairs, for example.  R55 nearly always requires additional notation except when the array consist of only nine members.

One additional aspect to consider is notating array size.  This is far simpler than the superscript notation and is nearly always complete adequate.  This alleviates the need to write in long form “A four drive RAID 10 array.”  Instead we can use a prefix for this.  4R10 would denote a four drive RAID 10 array.

So to look at our example from above, the twelve disk RAID 10 with the three-way mirror sets could be written out as 12R1304.  But the use of all three numbers becomes redundant.  Any one of the numbers can be dropped.  Typically this would be the final one as it is the least likely to be useful.  The R1 set size is useful in determining the basic risk and the leading 12 is used for capacity and performance calculations as well as chassis sizing and purchasing.  The trailing four is implied by the other two numbers and effectively useless on its own.  So the best way to write this would be simply 12R130.  If that same array was to use the common mirror pair approach rather than the three-way mirror we would simply write 12R10 to denote a twelve disk, standard RAID 10 array.

One Big RAID 10 – A New Standard in Server Storage

In the late 1990s the standard rule of thumb for building a new server was to put the operating system onto its own, small, RAID 1 array and separate out applications and data into a separate RAID 5 array.  This was done for several reasons, many of which have swirled away from us, lost in the sands of time.  The main driving factors were that storage capacity was extremely expensive, disks were small, filesystems corrupted regularly and physical hard drives failed at a very high rate compared to other types of failures.  People were driven by a need to protect against physical hard drive failures, protect against filesystem corruption and acquire enough capacity to meet their needs.

Today the storage landscape has changed.  Filesystems are incredibly robust and corruption from the filesystem itself is almost unheard of and, thank to technologies like journalling, can almost always be corrected quickly and effectively protecting the end users from data loss.  Almost no one worried about filesystem corruption today.

Modern filesystem are also able to handle far more capacity than they could previously.  It was not uncommon in the late 1990s and early 2000s to have the ability to easily make a drive array larger than any single filesystem could handle. Today that is not reasonably the case as all common filesystems handle many terabytes at least and often petabytes, exabytes or more of data.

Hard drives are much more reliable than they were in the late 1990s.  Failure rates for an entire drive failing are very low, even in less expensive drives.  So low, in fact, that array failures (data loss in the entire RAID array) is concerned with failing arrays primarily, rather than the failure of hard drives.  We no longer replace hard drives with wild abandon.  It is not unheard of for large arrays to run their entire lifespans without losing a single drive.

Capacities have scaled dramatically.  Instead of 4.3GB hard drives we are installing 3TB drives.  Nearly one thousand times more capacity on a single spindle compared to less than fifteen years ago.

These factors come together to create a need for a dramatically different approach to server storage design and a change to the “rule of thumb” about where to start when designing storage.

The old approach can be written RAID 1 + RAID 5.  The RAID 1 space was used for the operating system while the RAID 5 space, presumably much larger, was used for data and applications.  This design split the two storage concerns putting maximum effort into protecting the operating system (which was very hard to recover in case of disaster and on which the data relied for accessibility) onto highly reliable RAID 1.  Lower cost RAID 5, while somewhat riskier, was chosen, typically, for data because the cost of storing data on RAID 1 was too high in most cases.  It was a tradeoff that made sense at the time.

Today, with our very different concerns, a new approach is needed, and this new approach is known as “One Big RAID 10” – meaning a single, large RAID 10 array with operating system, applications and data all stored together.  Of course, this is just what we say to make it handy, in a system without the needs of performance or capacity beyond a single disk we would say “One Big RAID 1”, but many people include RAID 1 in the RAID 10 group so it is just easier to say the former.

To be even handier, we abbreviate this to OBR10.

Because the cost of storage has dropped considerably and instead of being at a premium is typically in abundance today, because filesystems are incredibly reliable, because RAID 1 and RAID 10 share performance characteristics and because non-disk failure triggered array failures have moved from background noise to primary causes of data loss the move to RAID 10 and to eliminate array splitting has become the new standard approach.

With RAID 10 we now have the highly available and resilient storage previously held only for the operating system available to all of our data.  We get the benefit of mirrored RAID performance plus the benefit of extra spindles for all of our data.  We get better drive capacity utilization and performance based on that improved utilization.

Even the traditional splitting of log files normally done with databases (the infamous RAID 1 + RAID 5 + RAID 1 approach) is no longer needed because RAID 10 keeps the optimum performance characteristics across all data.  With RAID 10 we eliminate almost all of the factors that once caused us to split arrays.

The only significant factor, that has not been mentioned, for which split arrays were traditionally seen as beneficial is access contention – the need for different processes to need access to different parts of the disk at the same time causing the drive head to move around in a less than ideal pattern reducing drive performance.  Contention was a big deal in the late 1990s when the old rule of thumb was developed.

Today, drive contention still exists but has been heavily mitigated by the use of large RAID caches.  In the late 90s drive caches were a few megabytes at best and often non-existent.  Today 256MB is a tiny cache and average servers are deployed with 1-2GB of cache on the RAID card alone.  Some systems are beginning to integrate additional solid state drive based caches to add a secondary cache beyond the memory cache on the controller.  These can easily add hundreds of gigabytes of extremely high speed cache that can buffer nearly any spindle operation from needing to worry about contention.  So the issue of contention has been solved in other ways over the years but has, like other technology changes, effectively freed us from the traditional concerns requiring us to split arrays.

Like array contention, another, far less common reason for splitting arrays in the late 1990s was to improve communications bus performance because of the limitations of the era’s SCSI and ATA technologies.  These, too, have been eliminated with the move to serial communications mechanisms, SAS and SATA, in modern arrays.  We are no longer limited to the capacity of a single bus for each array and can grow much larger with much more flexibility than previously.  Bus contention has been all but eliminated.

If there is a need to split off space for protection, such as log file growth, this can be achieved through partitioning rather than through physical array splitting.  In general you will want to minimize partitioning as it increases overhead and lowers the ability of the drives to tune themselves but there are cases where it is the better approach.  But it does not require that the underlying physical storage be split as it traditionally was.  Even better than partitioning, when available, is logical volume management which makes partition-like separations without the limitations of partitions.

So at the end of the day, the new rule of thumb for server storage is “One Big RAID 10.”  No more RAID 5, no more array splitting.  It’s about reliability, performance, ease of management and moderate cost effectiveness.  Like all rules of thumb, this does not apply to every single instance, but it does apply much more broadly than the old standard ever did.  RAID 1 + RAID 5, as a standard, was always an attempt to “make due” with something undesirable and to make the best of a bad situation.   OBR10 is not like that.  The new standard is a desired standard – it is how we actually want to run, not something with which we have been “stuck”.

When designing storage for a new server, start with OBR10 and only move away from it when it specifically does not meet your technology needs.  You should never have to justify using OBR10, only justify not using it.

 

Choosing RAID for Hard Drives in 2013

After many, many articles, discussions, threads, presentations, questions and posts on choosing RAID, I have finally decided to publish my 2012-2013 high level guide to choosing RAID.  The purpose of this article is not to broadly explain or defend RAID choices but to present a concise guide to making an educated, studied decision for RAID that makes sense for a given purpose.

Today, four key RAID types exist for the majority of purposes: RAID 0, RAID 1, RAID 6 and RAID 10.  Each has a place where it makes the most sense.  RAID 1 and RAID 10, one simply being an application of the other, can handily be considered as a single RAID type with the only significant difference being the size of the array.  Many vendors refer to RAID 1 incorrectly as RAID 10 today because of this and, while this is clearly a semantic mistake, we will call them RAID 1/10 here to make decision making less complicated.  Together they can be considered the “mirrored RAID” family and the differentiation between them is based solely on the number of pairs in the array.  One pair is RAID 1, more than one pair is RAID 10.

RAID 0: RAID without redundancy.  RAID 0 is very fast and very fragile.  It has practically no overhead and requires the fewest hard disks in order to accomplish capacity and performance goals.  RAID 0 is perfect for situations where data is volatile (such as temporary caches) and where data is read only and there are solid backups and where accessibility is not a key concern.  RAID 0 should never be used for live or critical data.

RAID 6: RAID 6 is the market standard today for parity RAID, the successor to RAID 5.  As such, RAID 6 is cost effective in larger arrays (five drives minimum, normally six or more drives) where performance and reliability are secondary concerns to cost.  RAID 6 is focused on cost effective capacity for near-line data.

RAID 1/10: Mirrored RAID provides the best speed and reliability making it ideally suited for online data – any data where speed and reliability are of the top concern.  It is the only reasonable choice for arrays of four or fewer drives where the data is non-volatile.  With rare exception, mirrored RAID should be the defacto choice for any RAID array where specific technical needs do not clearly mandate a RAID 1 or RAID 6 solution.

It is a rare circumstance where RAID 0 is required, very rare.  RAID 6 has a place in many organizations but almost never on its own.  Almost every organization should be relying on RAID 1 or 10 for its primary storage and potentially using other RAID types for special cases, such as backups, archives and caches.  It is a very, very rare business that wouldn’t not have RAID 10 as the primary storage for the bulk of its systems.

Choosing a RAID Level by Drive Count

In addition to all other factors, the number of drives available to you plays a significant role in choosing what RAID level is appropriate for you.  Ideally RAID is chosen ahead of time in conjunction with chassis and drives in a holistic approach so that the entire system is engineered for the desired purpose, but even in these cases, knowing how drive counts can affect useful RAID choices can be very helpful.

To simplify the list, RAID 0 will be left off of it.  RAID 0 is a viable choice for certain niche business scenarios in any count of drives.  So there is no need to display it on the list.  Also, the list assumes that a hot spare, if it exists, is not included in the count as that is “outside” of the RAID array and so would not be a part of the array drive count.

2 Drives: RAID 1

3 Drives: RAID 1 *

4 Drives: RAID 10

5 Drives: RAID 6

6 Drives: RAID 6 or RAID 10

7 Drives: RAID 6 or RAID 7

8 Drives: RAID 6 or RAID 7 or RAID 10 **

9 Drives: RAID 6 or RAID 7

10 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61

11 Drives: RAID 6 or RAID 7 

12 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61

13 Drives: RAID 6 or RAID 7

14 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61or RAID 70/71

15 Drives: RAID 6 or RAID 7 or RAID 60

16 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61 or RAID 70/71

17 Drives: RAID 6 or RAID 7

18 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61 or RAID 70/71

19 Drives: RAID 6 or RAID 7

20 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61 or RAID 70/71

21 Drives: RAID 6 or RAID 7 or RAID 60 or RAID 70

22 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61 or RAID 70/71

23 Drives: RAID 6 or RAID 7

24 Drives: RAID 6 or RAID 7 or RAID 10 or RAID 60/61 or RAID 70/71

25 Drives: RAID 6 or RAID 7 or RAID 60

………

* RAID 1 is technically viable at any drive count of two or more.  I have included it only up to three drives because using it beyond that point is generally considered absurd and is completely unheard of in the real world.  But technically it would continue to provide equal write performance while continuing to increase in read performance and reliability as more drives are added to the mirror.  But for reasons of practicality I have included it only twice on the list where it would actually be useful.

** At six drives and higher both RAID 6 and RAID 10 are viable options for arrays of even drive counts and RAID 6 alone is a viable option for odd numbered drive array counts.

For this list I have only considered the standard RAID levels of 0, 1, 4, 5, 6 and 10.  I left 0 off of the list because it is always viable for certain use cases.  RAID 5 never appears because there is no time on spindle hard drives today that it should be used, as RAID 5 is an enhancement of RAID 4, it too does not appear on the list.  Non-standard double parity RAID solutions such as Netapp’s RAID-DP and Oracle’s RAIDZ2 can be treated as derivations of RAID 6 and apply accordingly.  Oracle’s triple parity RAIDZ3 (sometimes called RAID 7) would apply at seven drives and higher but is a non-standard level and extremely rare so I included it in italics.

More commonly, RAID 6 makes sense at six drives or more and RAID 7 at eight drives or more.

Like RAID 4 and 5, RAID levels based on them (RAID 40, 50, 41, 51, 55, etc.) are not appropriate any longer due to the failure and fragility modes of spindle-based hard drives.  Complex RAID levels based on RAID 6 and 7 (60, 61, 70, 71, etc.) have a place but are exceedingly rare as they generally have very little cost savings compared to RAID 10 but suffer from performance issues and increased risk.  RAID 61 and 71 are almost exclusively effective when the highest order RAID, the mirror component, is over a network rather than local on the system.