RAID Notation Examples

As the new Network RAID Notation Standard (SAM RAID Notation) is a bit complex, I felt that it would be useful to provide a list of common use scenarios and specific implementation examples and how they would be notated.

  • Scenario: Netgear ReadyNAS Pro 2 with XRAID mirror.  Notation: R1
  • Scenario: Two Netgear ReadyNAS Ultra units with local RAID 1 sync’d over the network using rsync.  Notation: R1{1}
  • Scenario: Two Drobo B800fs NAS devices each loaded with single parity RAID sync’d using DroboSync. Notation: R5{1}
  • Scenario: Two Drobo B800fs NAS devices each with dual parity RAID sync’d using DroboSync.  Notation: R6{1}
  • Scenario: Two Linux servers with R6 locally using DRBD Mode A or B (asynchronous.)  Notation: R6[1]
  • Scenario: Two Linux servers with R5 locally using DRBD Mode C (synchronous.)  Notation: R6(1)
  • Scenario: Three node VMware vSphere VSA cluster with local R10.  Notation: R10(1)3
  • Scenario: Windows server with two four disk R0 stripes mirrored.  Notation: 8R01
  • Scenario: Two FreeBSD servers with R10 using HAST with memsync.  Notation: R10[1]
  • Scenario: Two FreeBSD servers with R1 using HAST with sync.  Notation: R1(1)
  • Scenario: Two Windows file servers with R10 using Robocopy to synchronize file systems. Notation: R10{1}
  • Scenario: Single Netgear SC101 SAN* using ZSAN drivers on Windows with two disks. Notation: R(1)

Technology References:

HAST: http://wiki.freebsd.org/HAST

DRBD: http://www.drbd.org/users-guide/s-replication-protocols.html

DroboSync: http://www.drobo.com/solutions/for-business/drobo-sync.php

Rsync: http://rsync.samba.org/

Robocopy: http://technet.microsoft.com/en-us/library/cc733145%28v=ws.10%29.aspx

Notes:

*The Netgear SC101 SAN is interesting in that while it can hold two PATA drives internally and exposes them to the network as block devices, via the ZSAN protocol, through a single Ethernet interface but there is no internal communications between the devices so all mirroring of the array happens in Windows which actually sees each disk as an entirely separate SAN device each with its own IP address.  Windows has no way to know that the two devices are related. The RAID 1 mirroring is handled one hundred percent in software RAID on Windows and the SAN itself is always two independent PATA drives exposed raw to the network.  A very odd, but enlightening device.

Network RAID Notation Standard (SAM RAID Notation)

As the RAID landscape becomes more complex with the emergence of network RAID there is an important need for a more complex and concise notation system for RAID levels involving a network component.

Traditional RAID comes in single digit notation and the available levels are 0, 1, 2, 3, 4, 5, 6, 7.  Level 7 is unofficial but widely accepted as triple parity RAID (the natural extension of RAID 5 and RAID 6) and RAID 2 and RAID 3 are effectively disused today.

Nested RAID, one RAID level within another, is handled by putting single digit RAID levels together such as RAID 10, 50, 61, 100, etc.  These can alternatively be written with a plus sign separating the levels like RAID 1+0, 5+0, 6+1, 1+0+0, etc.

There are two major issues with this notation system, beyond the obvious issue that not all RAID types or extensions are accounted for by the single digit system with many of the aspects of proprietary RAID systems such as ZRAID, XRAID and BeyondRAID being unaccounted for in the notation system.  The first is a lack of network RAID notation and the second is a lack of specific denotation of intra-RAID configuration.

Network RAID comes in two key types, synchronous and asynchronous.  Synchronous network RAID operates effectively identically to its non-networked counterpart.  Asynchronous functions the same but brings extra risks as data may not be synchronized across devices at the time of a device failure.  So the differences between the two need to be visible in the notation.

Synchronous RAID should be denoted with parenthesis.  So two local RAID 10 systems mirrored over the network (a la DRBD) would be denoted RAID 10(1).  The effective RAID level for risk and capacity calculations would be the same as any RAID 101 but this informs all parties at a glance that the mirror is over a network.

Asynchronous RAID should be denoted with brackets.  So two local RAID 10 systems mirrored over the network asynchronously would be denoted as RAID 10[1] making it clear that there is a risky delay in the system.

There is an additional need for a different type of replication at a higher, filesystem level (a la rsync) that, while not truly related to RAID, provides a similar function for cold data and is often used in RAID discussions and I believe that storage engineers need the ability to quite denote this as well.  This asynchronous file-system level replication can be denoted by braces.  Only one notation is needed as file-system level replication is always asynchronous.  So as an example, two RAID 6 arrays synced automatically with a block-differential file system replication system would be denoted as RAID 6{1}.

To further simplify RAID notation and to shorten the obvious need to write the word “RAID” repeatedly as well as to remove ourselves from the traditional distractions of what the acronym stands for so that we can focus on the relevant replication aspects of it, a simple “R” prefix should be used.  So RAID 10 would simply be R10.  Or a purely networked mirror might be R(1).

This leaves one major aspect of RAID notation to address and that is the size of each component of the array.  Often this is implied but some RAID levels, especially those that are nested, can have complexities missed by traditional notation.  Knowing the total number of drives in an array does not always denote the setup of a specific array.  For example a 24 drive R10 is assumed to be twelve pairs of mirrors in a R0 stripe.  But it could be eight sets of triple mirrors in a R0 stripe.  Or it could even be six quad mirrors.  Or four sext mirrors.  Or three oct mirrors.  Or two dodeca mirrors.  While most of these are extremely unlikely, there is a need to notate it.  For the set size we use a superscript number to denote the size of that set.  Generally this is only needed for one aspect of the array, not all, as others can be derived, but when in down it can be denoted explicitly.

So an R10 array using three-way mirror sets would be R130.  Lacking the ability to write a superscript you could also write it as R1^3+0.  This notation does not state the complete array size, only its configuration type.  If all possible superscripts are included a full array size can be calculated using nothing more.  If we have an R10 of four sets of three-way mirrors we could write it R1304 which would inform us that the entire array consists of twelve drives – or in the alternate notation R1^3+0^4.

Superscript notation of sets is only necessary when non-obvious.  R10 with no other notation implies that the R1 component is mirror pairs, for example.  R55 nearly always requires additional notation except when the array consist of only nine members.

One additional aspect to consider is notating array size.  This is far simpler than the superscript notation and is nearly always complete adequate.  This alleviates the need to write in long form “A four drive RAID 10 array.”  Instead we can use a prefix for this.  4R10 would denote a four drive RAID 10 array.

So to look at our example from above, the twelve disk RAID 10 with the three-way mirror sets could be written out as 12R1304.  But the use of all three numbers becomes redundant.  Any one of the numbers can be dropped.  Typically this would be the final one as it is the least likely to be useful.  The R1 set size is useful in determining the basic risk and the leading 12 is used for capacity and performance calculations as well as chassis sizing and purchasing.  The trailing four is implied by the other two numbers and effectively useless on its own.  So the best way to write this would be simply 12R130.  If that same array was to use the common mirror pair approach rather than the three-way mirror we would simply write 12R10 to denote a twelve disk, standard RAID 10 array.

One Big RAID 10 – A New Standard in Server Storage

In the late 1990s the standard rule of thumb for building a new server was to put the operating system onto its own, small, RAID 1 array and separate out applications and data into a separate RAID 5 array.  This was done for several reasons, many of which have swirled away from us, lost in the sands of time.  The main driving factors were that storage capacity was extremely expensive, disks were small, filesystems corrupted regularly and physical hard drives failed at a very high rate compared to other types of failures.  People were driven by a need to protect against physical hard drive failures, protect against filesystem corruption and acquire enough capacity to meet their needs.

Today the storage landscape has changed.  Filesystems are incredibly robust and corruption from the filesystem itself is almost unheard of and, thank to technologies like journalling, can almost always be corrected quickly and effectively protecting the end users from data loss.  Almost no one worried about filesystem corruption today.

Modern filesystem are also able to handle far more capacity than they could previously.  It was not uncommon in the late 1990s and early 2000s to have the ability to easily make a drive array larger than any single filesystem could handle. Today that is not reasonably the case as all common filesystems handle many terabytes at least and often petabytes, exabytes or more of data.

Hard drives are much more reliable than they were in the late 1990s.  Failure rates for an entire drive failing are very low, even in less expensive drives.  So low, in fact, that array failures (data loss in the entire RAID array) is concerned with failing arrays primarily, rather than the failure of hard drives.  We no longer replace hard drives with wild abandon.  It is not unheard of for large arrays to run their entire lifespans without losing a single drive.

Capacities have scaled dramatically.  Instead of 4.3GB hard drives we are installing 3TB drives.  Nearly one thousand times more capacity on a single spindle compared to less than fifteen years ago.

These factors come together to create a need for a dramatically different approach to server storage design and a change to the “rule of thumb” about where to start when designing storage.

The old approach can be written RAID 1 + RAID 5.  The RAID 1 space was used for the operating system while the RAID 5 space, presumably much larger, was used for data and applications.  This design split the two storage concerns putting maximum effort into protecting the operating system (which was very hard to recover in case of disaster and on which the data relied for accessibility) onto highly reliable RAID 1.  Lower cost RAID 5, while somewhat riskier, was chosen, typically, for data because the cost of storing data on RAID 1 was too high in most cases.  It was a tradeoff that made sense at the time.

Today, with our very different concerns, a new approach is needed, and this new approach is known as “One Big RAID 10” – meaning a single, large RAID 10 array with operating system, applications and data all stored together.  Of course, this is just what we say to make it handy, in a system without the needs of performance or capacity beyond a single disk we would say “One Big RAID 1”, but many people include RAID 1 in the RAID 10 group so it is just easier to say the former.

To be even handier, we abbreviate this to OBR10.

Because the cost of storage has dropped considerably and instead of being at a premium is typically in abundance today, because filesystems are incredibly reliable, because RAID 1 and RAID 10 share performance characteristics and because non-disk failure triggered array failures have moved from background noise to primary causes of data loss the move to RAID 10 and to eliminate array splitting has become the new standard approach.

With RAID 10 we now have the highly available and resilient storage previously held only for the operating system available to all of our data.  We get the benefit of mirrored RAID performance plus the benefit of extra spindles for all of our data.  We get better drive capacity utilization and performance based on that improved utilization.

Even the traditional splitting of log files normally done with databases (the infamous RAID 1 + RAID 5 + RAID 1 approach) is no longer needed because RAID 10 keeps the optimum performance characteristics across all data.  With RAID 10 we eliminate almost all of the factors that once caused us to split arrays.

The only significant factor, that has not been mentioned, for which split arrays were traditionally seen as beneficial is access contention – the need for different processes to need access to different parts of the disk at the same time causing the drive head to move around in a less than ideal pattern reducing drive performance.  Contention was a big deal in the late 1990s when the old rule of thumb was developed.

Today, drive contention still exists but has been heavily mitigated by the use of large RAID caches.  In the late 90s drive caches were a few megabytes at best and often non-existent.  Today 256MB is a tiny cache and average servers are deployed with 1-2GB of cache on the RAID card alone.  Some systems are beginning to integrate additional solid state drive based caches to add a secondary cache beyond the memory cache on the controller.  These can easily add hundreds of gigabytes of extremely high speed cache that can buffer nearly any spindle operation from needing to worry about contention.  So the issue of contention has been solved in other ways over the years but has, like other technology changes, effectively freed us from the traditional concerns requiring us to split arrays.

Like array contention, another, far less common reason for splitting arrays in the late 1990s was to improve communications bus performance because of the limitations of the era’s SCSI and ATA technologies.  These, too, have been eliminated with the move to serial communications mechanisms, SAS and SATA, in modern arrays.  We are no longer limited to the capacity of a single bus for each array and can grow much larger with much more flexibility than previously.  Bus contention has been all but eliminated.

If there is a need to split off space for protection, such as log file growth, this can be achieved through partitioning rather than through physical array splitting.  In general you will want to minimize partitioning as it increases overhead and lowers the ability of the drives to tune themselves but there are cases where it is the better approach.  But it does not require that the underlying physical storage be split as it traditionally was.  Even better than partitioning, when available, is logical volume management which makes partition-like separations without the limitations of partitions.

So at the end of the day, the new rule of thumb for server storage is “One Big RAID 10.”  No more RAID 5, no more array splitting.  It’s about reliability, performance, ease of management and moderate cost effectiveness.  Like all rules of thumb, this does not apply to every single instance, but it does apply much more broadly than the old standard ever did.  RAID 1 + RAID 5, as a standard, was always an attempt to “make due” with something undesirable and to make the best of a bad situation.   OBR10 is not like that.  The new standard is a desired standard – it is how we actually want to run, not something with which we have been “stuck”.

When designing storage for a new server, start with OBR10 and only move away from it when it specifically does not meet your technology needs.  You should never have to justify using OBR10, only justify not using it.

 

Virtualization as a Standard Pattern

Virtualization as an enterprise concept is almost as old as business computing is itself.  The value of abstracting computing from the bare hardware was recognized very early on and almost as soon as computers had the power to manage the abstraction process, work began in implementing virtualization much as we know it today.

The earliest commonly accepted work on virtualization began in 1964 with the IBM CP-40 operating system developers for the IBM System/360 mainframe.  This was the first real foray into commercial virtualization and the code and design from this early virtualization platform has descended today into the IBM VM platform that has been used continuously since 1972 as a virtualization layer for the IBM mainframe families over the decades.  Since IBM first introduced virtualization we have seen enterprise systems adopting this pattern of hardware abstraction almost universally.  Many large scale computing systems, minicomputers and mainframes, moved to virtualization during the 1970s with the bulk of all remaining enterprise systems doing so, as the power and technology were available to them, during the 1980s and 1990s.

The only notable holdout to virtualization for enterprise computing was the Intel IA32 (aka x86) platform which lacked the advanced hardware resources necessary to implement effective virtualization until the advent of the extended AMD64 64-bit platform and even then only with specific new technology.  Once this was introduced the same high performance, highly secure virtualization was available across the board on all major platforms for business computing.

Because low cost x86 platforms lacked meaningful virtualization (outside of generally low performance software virtualization and niche high performance paravirtualization platforms) until the mid-2000s this left virtualization almost completely off of the table for the vast majority of small and medium businesses.  This has lead many dedicated to the SMB space to be unaware that virtualization is a well established, mature technology set that long ago established itself as the de facto pattern for business server computing.  The use of hardware abstraction is nearly ubiquitous in enterprise computing with many of the largest, most stable platforms having no option, at least no officially support option, for running systems “bare metal.”

There are specific niches where the need to avoid hardware abstraction through virtualization is not advised but these are extremely rare, especially in the SMB market.  Typical systems needing to not be virtualized include latency sensitive systems (such as low latency trading platforms) and multi-server combined workloads such as HPC compute clusters where the primary goal is performance above stability and utility.  Neither of these are common to the SMB.

Virtualization offers many advantages.  Often, in the SMB where virtualization is less expected, it is assumed that virtualization’s goal is consolidation where massive scale cost savings can occur or in providing new ways to provide for high availability.  Both of these are great options that can help specific organizations and situations but neither is the underlying justification for virtualization.  We can consolidate and achieve HA through other means, if necessary.  Virtualization simply provides us with a great array of options in those specific areas.

Many of the uses of virtualization are artifacts of the ecosystem such as a potential reduction in licensing costs.  These types of advantages are not intrinsic advantages to virtualization but do exist and cannot be overlooked in a real world evaluation.  Not all benefits apply to all hypervisors or virtualization platforms but nearly all apply across the board.  Hardware abstraction is a concept, not an implementation, so how it is leveraged will vary.  Conceptually, abstracting away hardware whether at the storage layer, at the computing layer, etc. is very important as it eases management, improves reliability and speeds development.

Here are some of the benefits from virtualization.  It is important to note that outside of specific things such as consolidation and high availability nearly all of these benefits apply not only to virtualizing on a single hardware node but for a single workload on that node.

  1. Reduced human effort and impact associated with hardware changes, breaks, modifications, expansion, etc.
  2. Storage encapsulation for simplified backup / restore process, even with disparate hardware targets
  3. Snapshotting of entire system for change management protection
  4. Ease of archiving upon retirement or decommission
  5. Better monitoring capabilities, adding out of band management even on hardware platforms that don’t offer this natively
  6. Hardware agnosticism provides for no vendor lock-in as the operating systems believe the hypervisor is the hardware rather than the hardware itself
  7. Easy workload segmentation
  8. Easy consolidation while maintaining workload segmentation
  9. Greatly improved resource utilization
  10. Hardware abstraction creates a significantly realized opportunity for improved system performance and stability while lowering the demands on the operating system and driver writers for client operating systems
  11. Simplified deployment of new and varied workloads
  12. Simple transition from single platform to multi-platform hosting environments which then allow for the addition of options such as cloud deployments or high availability platform systems
  13. Redeployment of workloads to allow for easy physical scaling

In today’s computing environments, server-side workloads should be universally virtualized for these reasons.  The benefits of virtualization are extreme while the downsides are few and trivial.  The two common scenarios where virtualization still needs to be avoided are in situations where there is specialty hardware that must be used directly on the server (this has become very rare today, but does still exist from time to time) and extremely low latency systems where sub-millisecond latencies are critical.  The second of these is common only in extremely niche business situations such as low latency investment trading systems.  Systems with these requirements will also have incredible networking and geolocational requirements such as low-latency Infiniband with fiber to the trading floor of less than five miles.

Some people will point out that high performance computing clusters do not use virtualization, but this is a grey area as any form of clustering is, in fact, a form of virtualization.  It is simply that this is a “super-system” level of virtualization instead of being strictly at the system level.

It is safe to assume that any scenario in which you might find yourself in which you should not use virtualization you will know it beyond a shadow of a doubt and will be able to empirically demonstrate why virtualization is either physically or practically impossible.  For all other cases, virtualize.  Virtualize if you have only one physical server and one physically workload and just one user.  Virtualize if you are a Fortune 100 with the most demanding workloads.  And virtualize if you are anyone in between.  Size is not a factor in virtualization; we virtualize out of a desire to have a more effective and stable computing environment both today and into the future.

 

The Information Technology Resource for Small Business