Tag Archives: software raid

The Software RAID Inflection Point

In June, 2001 something amazing happened in the IT world: Intel released the Tualatin based Pentium IIIS 1.0 GHz processor. This was one of the first few Intel processors (IA32 architecture) to have crossed the 1 GHz clock barrier and the first of any significance. It was also special in that it had dual processor support and a double sized cache compared to its Coppermine based forerunners or it’s non-“S” Tualatin successor (that followed just one month behind.) The PIIIS system boards were insanely popular in their era and formed the backbone of high performance commodity servers, such as Proliant and PowerEdge, in 2001 and for the next few years culminating in the Pentium IIIS 1.4GHz dual processor systems that were so important that they resulted in kicking off the now famous HP Proliant “G” naming convention. The Pentium III boxes were “G1”.

What does any of this have to do with RAID? Well, we need to step back and look at where RAID was up until May, 2001. From the 1990s and up to May, 2001 hardware RAID was the standard for the IA32 server world which mainly included systems like Novell Netware, Windows NT 4, Windows 2000 and some Linux. Software RAID did exist for some of these systems (not Netware) but servers were always struggling for CPU and memory resources and expending these precious resources on RAID functions was costly and would cause applications to compete with RAID for access and the systems would often choke on the conflict. Hardware RAID solved this by adding dedicated CPU and RAM just for these functions.

RAID in the late 1990s and early 2000s was also very highly based around RAID 5 and to a lesser degree, RAID 6, parity striping because disks were tiny and extremely expensive for capacity and squeezing maximum capacity out of the available disks was of utmost priority and risks like URE were so trivial due to the small capacity sizes that parity RAID was very reliable, all things considered. The factors were completely different than they would be by 2009. In 2001, it was still common to see 2.1GB, 4.3GB and 9GB hard drives in enterprise servers!

Because parity RAID was the order of the day, and many drives were typically used on each server, RAID had more CPU overhead on average in 2000 than it did in 2010! So the impact of RAID on system resources was very significant.

And that is the background. But in June, 2001 suddenly the people who had been buying very low powered IA32 systems had access to the Tualatin Pentium IIIS processors with greatly improved clock speeds, efficient dual processor support and double sized on chip caches that presented an astounding leap in system performance literally over night. With all this new power and no corresponding change in software demands systems that traditionally were starved for CPU and RAM suddenly had more than they knew how to use, especially as additional threads were available and most applications of the time were single threaded.

The system CPUs, even in the Pentium III era, were dramatically more powerful than the small CPUs, which were often entry level PowerPC or MIPS chips, on the hardware RAID controllers and the available system memory was often much larger than the hardware RAM caches and investing in extra system memory was often far more effective and generally advantages so with the availability of free capacity on the main system RAID functions could, on average be moved from the hardware RAID cards to the central system and gain performance, even while giving up the additional CPU and RAM of the hardware RAID cards. This was not true on overloaded systems, those starved for resources and was more relevant for parity RAID systems with RAID 6 benefiting the most and non-parity systems like RAID 1 and 0 benefiting the least.

But June, 2001 was the famous inflection point – before that date the average IA32 system was faster when using hardware RAID. And after June, 2001 new systems purchased would on average be faster with software RAID. With each passing year the advantages have leaned more and more towards software RAID with the abundance of underutilized core CPUs and idle threads and spare RAM exploding with the only advantage towards hardware RAID being the drop in parity RAID usage as mirrored RAID took over as the standard as disk sizes increased dramatically while capacity costs dropped.

Today is has been more than fifteen years since the notion that hardware RAID would be faster has been retired. The belief lingers on due primarily to the odd “Class of 1998” effect. But this has long been a myth repeated improperly by those that did not take the time to understand the original source material. Hardware RAID continues to have benefits, but performance has not been one of them for the majority of the time that we’ve had RAID and is not expected to ever rise again.

Hardware and Software RAID

RAID, Redundant Array of Inexpensive Disks, systems are implemented in one of two basic ways: software or dedicated hardware.  Both methods are very viable and have their own merits.

In the small business space, where Intel and AMD architecture systems and Windows operating systems rule, hardware RAID is so common that a lot of confusion has arisen around software RAID due, as we will see, in no small part to the wealth of scam software RAID products touted as dedicated hardware and known colloquially as “Fake RAID.”

When RAID was first developed, it was used, in software, on high end enterprise servers running things like proprietary UNIX where the systems were extremely stable and the hardware was very powerful and robust making software RAID work very well.  Early RAID was primarily focused on mirrored RAID or very simplistic parity RAID (like RAID 2) which had little overhead.

As the need for RAID began to spill into the smaller server space and as parity RAID began to grow in popularity requiring greater processing power to support it became an issue that the underpowered processors in the x86 space were significantly impacted by the processing load of RAID, especially RAID 5.  This, combined with almost no operating systems heavily used on these platforms having software RAID implementations, lead to the natural development of hardware RAID – an offload processor board (similar to a GPU for graphics) that had its own complete computer on board with CPU and memory and firmware all of its own.

Hardware RAID worked very well at solving the RAID overhead problem in the x86 server space.  As CPUs gained more power and memory became less scarce popular x86 operating systems like Windows Server began to offer software RAID options.  Specifically Windows software RAID was known as a poor RAID implementation and was available only on server operating system versions causing a lack of appreciation for software RAID in the community of system administrators working primarily with Windows.

Because of historical implementations in the enterprise server space and the commodity x86 space there became a natural separation between the two markets supported initially by technology and later purely by ideology.  If you talk to a system administrator in the commodity space you will almost universally hear that hardware RAID is the only option.  Conversely if you talk to a system administrator in the mainframe, RISC (Sparc, Power, ARM) or EPIC (Itanium) server (sometimes called UNIX server) space you will often be met with surprise as hardware RAID isn’t available for those classes of systems – software RAID is simply a forgone conclusion.  Neither camp seems to have a real knowledge of the situation in the opposite one and crossovers in skill sets between these two is relatively rare until recently as enterprise UNIX platforms like Linux, Solaris and FreeBSD have started to become very popular and well understood on commodity hardware platforms.

To make matters more confusing for the commodity server space, in order to fill the vacuum left by the dominate operating system vendor’s lack of software RAID for the non-server operating system market while attempting to market to a less technically savvy target audience, a large number of vendors began selling non-RAID controller cards along with a “driver” that was actually software RAID and pretending that the resulting product was actually hardware RAID.  This created a large amount of confusion at best and an incredible disdain for software RAID at worse as almost universally any system whose core function is to protect data whose market is built upon deception and confusion will result in disaster.  Fake RAID systems routinely have issues with performance and reliability.  While, in theory, a third party software RAID package is a reasonable option, the reality of the software RAID market is that essentially all quality software RAID implementations are native components of either the operating system itself (Linux, Mac OSX, Solaris, Windows)  or of the filesystem (ZFS, VxFS, BtrFS) and are provided and maintained by primary vendors leaving little room or purpose for third party products outside of the Windows desktop space where a few, small legitimate software RAID players do exist but are often overshadowed by the Fake RAID players.

Today there is almost no need for hardware RAID as commodity platforms are incredibly powerful and there is almost always a dramatic excess of both computational and memory resources.  Hardware RAID instead competes mostly based on features rather than on reducing resource load.  Selection of hardware RAID versus software RAID in the commodity server space is almost completely one of preference and market momentum rather than of specific performance or features – both platforms essentially are equal with individual implementations being far more important in considering product options rather than hardware and software approaches are on their own.

Today hardware RAID offerings tend to be more “generic” with rather vanilla implementations of standard RAID levels.  Hardware RAID tends to earn its value through resource utilization reduction (CPU and memory offload), ability to “blind swap” failed drives, simplified storage management, block level storage agnostically abstracted from the operating system, fast cache close to the drives and battery or flash backed cache.  Software RAID tends to earn its value through lower power consumption, lower cost of acquisition, integrated management with the operating system, unique or advanced RAID features (such as ZFS’ RAIDZ that doesn’t suffer from the standard RAID 5 write hole) and generally better overall performance.  It is truly not a discussion of better or worse but of better or worse for a very specific situation with the most important factor often being familiarity and comfort and/or default vendor offering.

One of the most overlooked but important differentiators between hardware and software RAID is the change in the job role associated with RAID array management.  Hardware RAID moves the handling of the array to the server administrator (the support role that works on the physical server and is  stationed in the datacenter) whereas software RAID moves the handling of the array to the system administrator (the support role working on the operating system and above and rarely sitting in the datacenter.)  In the SMB market this factor might be completely overlooked but in a Fortune 500 the difference in job role can be very significant.  In many cases with hardware RAID disk replacements and system setup can be done without the need for system administrator intervention.  Datacenter server administrators can discover failed drives either through alerts or by looking for “amber lights” during walkthroughs and do replacements on the fly without needing to contact anyone or know what the server is even running.  Software RAID almost always would require the system administrator to be involved in managing the offlining of a failed disk, coordinating the replacement process with the datacenter and onlining the new one once the replacement process was completed.

Because of the way that CPU offloading and performance works and because of some advantages in the way that non-standard RAID implementations often handle parity RAID reconstruction there is a tendency for mirrored RAID levels to favor hardware RAID and software RAID levels to favor parity RAID.  Parity RAID is drastically more CPU intensive and so having access to the high power central CPU resources can be a major factor in speeding up RAID calculations.  But with mirrored RAID where RAID reconstruction is far safer than with parity RAID and where automated rebuilds are more important then hardware RAID brings the benefit of allowing blind drive replacement very easily.

One aspect of the hardware and software RAID discussion that is extremely paradoxical is that the same market that often dismisses software RAID out of hand as being inferior to hardware RAID is almost completely overlapping (you can picture the Venn Diagram in your head here) with the market that feels that file servers are inferior to commodity NAS appliances yet those NAS appliances in the SMB range are almost universally based on the same software RAID implementations being casually dismissed.  So it is often considered both inferior and superior simultaneously.  Some NAS devices in the SMB range, and NAS appliance software, that are software RAID based include: Netgear ReadyNAS, Netgear ReadyData, Buffalo Terastation, QNAP, Synology, OpenFiler FreeNAS, Nexenta and NAS4Free.

There is truly no “always use one way or the other” with hardware and software RAID.  Even giant, six figure enterprise NAS and SAN appliances are undecided as to which to use with part of the industry going each direction.  The real answer is it depends on your specific situation – your job role separation, your technical needs, your experience, your budget, etc.  Both options are completely viable in any organization.