Tag Archives: nas

When to Consider a SAN?

Everyone seems to want to jump into purchasing a SAN, sometimes quite passionately.  SANs are, admittedly, pretty cool.  They are one of the more fun and exciting, large scale hardware items that most IT professionals get a chance to have in their own shop.  Often the desire to have a SAN of ones own is a matter of “keeping up with the Jones” as using a SAN has become a bit of a status symbol – one of those last bastions of big business IT that you only see in a dedicated server closet and never in someone’s home (well, almost never.)  SANs are pushed heavily, advertised and sold as amazing boxes with internal redundancy making them infallible, speed that defies logic and loaded with features that you never knew that you needed.  When speaking to IT pros designing new systems, one of the most common design aspects that I hear is “well we don’t know much about our final design, but we know that we need a SAN.”

In the context of this article, I use SAN in its most common context, that is to mean a “block storage device” and not to refer to the entire storage network itself.  A storage network can exist for NAS but not use a SAN block storage device at all. So for this article SAN refers exclusively to SAN as a device, not SAN as a network.  SAN is a soft term used to mean multiple things at different times and can become quite confusing.  A SAN configured without a network becomes DAS.  DAS that is networked becomes SAN.

Let’s stop for a moment.  SAN is your back end storage.  The need for it would be, in all cases, determined by other aspects of your architecture.  If you have not yet decided upon many other pieces, you simply cannot know that a SAN is going to be needed, or even useful, in the final design.  Red flags. Red flags everywhere.  Imagine a Roman chariot race with the horses pushes the chariots (if you know what I mean.)

It is clear that the drive to implement a SAN is so strong that often entire projects are devised with little purpose except, it would seem, to justify the purchase of the SAN.  As with any project, the first question that one must ask is “What is the business need that we are attempting to fill?”   And work from there, not “We want to buy a SAN, where can we use it?”  SANs are complex, and with complexity comes fragility.  Very often SANs carry high cost.  But the scariest aspect of a SAN is the widespread lack of deep industry knowledge concerning them.  SANs pose huge technical and business risk that must be overcome to justify their use.  SANs are, without a doubt, very exciting and quite useful, but that is seldom good enough to warrant the desire for one.

We refer to SANs as “the storage of last resort.”  What this means is, when picking types of storage, you hope that you can use any of the other alternatives such as local drives, DAS (Direct Attach Storage) or NAS (Network Attached Storage) rather than SAN.  Most times, other options work wonderfully.  But there are times when the business needs demand requirements that can only reasonably be met with a SAN.  When those come up, we have no choice and must use a SAN.  But generally it can be avoided in favor of simpler and normally less costly or risky options.

I find that most people looking to implement a SAN are doing so under a number of misconceptions.

The first is that SANs, by their very nature, are highly reliable.  While there are certainly many SAN vendors and specific SAN products that are amazingly reliable, the same could be said about any IT product.  High end servers in the price range of high end SANs are every bit as reliable as SANs.  Since SANs are made from the same hardware components as normal servers, there is no magic to making them more reliable.  Anything that can be used to make a SAN reliable is a trickle down of server RAS (Reliability, Availability and Serviceability) technologies.  Just like SAN, NAS and DAS, as well as local disks, can be made incredibly reliable.  SAN only refers to the device being used to serve block storage rather than perform some other task.  A SAN is just a very simple server.  SANs encompass the entire range of reliability with mainframe-like reliability at the top end to devices that are nothing more than external hard drives – the most unreliable network devices on your network – on the bottom end.  So rather than SAN meaning reliability, it actually offers a few special cases of being the lowest reliability you can imagine.  But, for all intents and purposes, all servers share roughly equal reliability concerns.  SANs gain a reputation for reliability because often businesses put extreme budgets into their SANs that they do not put into their servers so that the comparison is a relatively high end SAN to a relatively budget server.

The second is that SAN means “big” and NAS means “small.”  There is no such association.  Both SANs and NASs can be of nearly any scale or quality.  They both run the gamut and there isn’t the slightest suggestion from the technology chosen whether a device is large or not.  Again, as above, SAN actually can technically come “smaller” than a NAS solution due to its possible simplicity but this is a specialty case and mostly only theoretical although there are SAN products on the market that are in this category, just very rare to find them in use.

The third is that SAN and NAS are dramatically different inside the chassis.  This is certainly not the case as the majority of SAN and NAS devices today are what is called “unified storage” meaning a storage appliance that acts simultaneously as both SAN and NAS.  This highlights that the key difference between the two is not in backend technology or hardware or size or reliability but the defining difference is the protocols used to transfer storage.  SANs are block storage exposing raw block devices onto the network using protocols like fibre channel, iSCSI, SAS, ZSAN, ATA over Ethernet (AoE) or Fibre Channel over Ethernet (FCoE.)  NAS, on the other hand, uses a network file system and exposes files onto the network using application layer protocols like NFS, SMB, AFP, HTTP and FTP which then ride over TCP/IP.

The fourth is that SANs are inherently a file sharing technology.  This is NAS.  SAN is simply taking your block storage (hard disk subsystem) and making it remotely available over a network.  The nature of networks suggests that we can attach that storage to multiple devices at once and indeed, physically, we can.  Just as we used to be able to physically attach multiple controllers to opposite ends of a SCSI ribbon cable with hard drives dangling in the middle.  This will, under normal circumstances, destroy all of the data on the drives as the controllers, which know nothing about each other, overwrite data from each other causing near instant corruption.  There are mechanisms available in special clustered filesystems and their drivers to allow for this, but this requires special knowledge and understanding that is far more technical than many people acquiring SANs are aware that they need for what they often believe is the very purpose of the SAN – a disaster so common that I probably speak to someone who has done just this almost weekly.  That the SAN puts at risk the very use case that most people believe it is designed to handle and not only fails to deliver the nearly magic protection sought but is, to the contrary, the very cause of the loss of data exposes the level of risk that implemented misunderstood storage technology carrier with it.

The fifth is that SANs are fast.  SANs can be fast; they can also be horrifically slow.  There is no intrinsic speed boost from the use of SAN technology on its own.  It is actually fairly difficult for SANs to overcome the inherent bottlenecks introduced by the network on which they sit.  As some other storage options such as DAS use all the same technologies as SAN but lack the bottleneck and latency of the actual network an equivalent DAS will also be just a little faster than its SAN complement.  SANs are generally a little faster than a hardware-identical NAS equivalent, but even this is not guaranteed.  SAN and NAS behave differently and in different use cases either may be the better performing.  SAN would rarely be chosen as a solution based on performance needs.

The sixth is that by being a SAN that the inherent problems associated with storage choices no longer apply.  A good example is the use of RAID 5.  This would be considered bad practice to do in a server, but when working with a SAN (which in theory is far more critical than a stand alone server) often careful storage subsystem planning is eschewed based on a belief that being a SAN that it has somehow fixed those issues or that they do not apply.  It is true that some high end SANs do have some amount of risk mitigation features unlikely to be found elsewhere, but these are rare and exclusively relegated to very high end units where using fragile designs would already be uncommon.  It is a dangerous, but very common practice, to take great care and consideration when planning storage for a physical server but when using a SAN that same planning and oversight is often skipped based on the assumption that the SAN handles all of that internally or that it is simply no longer needed.

Having shot down many misconceptions about SAN one may be wondering if SANs are ever appropriate.  They are, of course, quite important and incredibly valuable when used correctly.  The strongest points of SANs come from consolidation and special types of shared storage.

Consolidation was the historical driver bringing customers to SAN solutions.  A SAN allows us to combine many filesystems into a single disk array allowing far more efficient use of storage resources.  Because SAN is block level it is able to do this anytime that a traditional, local disk subsystem could be employed.  In many servers, and even many desktops, storage space is wasted due to the necessities of growth, planning and disk capacity granularity.  If we have twenty servers each with 300GB drive arrays but each only using 80GB of that capacity, we have large waste.  With a SAN would could consolidate to just 1.6TB plus a small amount necessary for overhead and spend far less on physical disks than if each server was maintaining its own storage.

Once we begin consolidating storage we begin to look for advanced consolidation opportunities.  Having consolidated many server filessytems onto a single SAN we have the chance, if our SAN implementation supports it, to deduplicate and compress that data which, in many cases such as server filesystems, can potentially result in significant utilization reduction.  So out 1.6TB in our example above might actually end up being only 800GB or less.  Suddenly our consolidation numbers are getting better and better.

To efficiently leverage consolidation it is necessary to have scale and this is where SANs really shine – when scale but in capacity and, more importantly, in the number of attaching nodes become very large.  SANs are best suited to large scale storage consolidation.  This is their sweet spot and what makes them nearly ubiquitous in large enterprises and very rare in small ones.

SANs are also very important for certain types of clustering and shared storage that requires single shared filesystem access.  These is actually a pretty rare need outside of one special circumstance – databases.  Most applications are happy to utilize any type of storage provided to them, but databases often require low level block access to be able to properly manipulate their data most effectively.  Because of this they can rarely be used, or used effectively, on NAS or file servers.  Providing high availability storage environments for database clusters has long been a key use case of SAN storage.

Outside of these two primary use cases, which justify the vast majority of SAN installations, SAN also provides for high levels of storage flexibility in making it potentially very simple to move, grow and modify storage in a large environment without needing to deal with physical moves or complicated procurement and provisioning.  Again, like consolidation, this is an artifact of large scale.

In very large environments, the use of SAN can also be used to provide a point a demarcation between storage and system engineering teams allowing there to be a handoff at the network layer, generally of fibre channel or iSCSI.  This clear separation of duties can be critical in allowing for teams to be highly segregated in companies that want highly discrete storage, network and systems teams.  This allows the storage team to do nothing but focus on storage and the systems team to do nothing but focus on the systems without any need for knowledge of the other team’s implementations.

For a long time SANs also presented themselves as a convenient means to improve storage performance.  This is not an intrinsic component of SAN but an outgrowth of their common use for consolidation.  Similarly to virtualization when used as consolidation, shared SANs will have a nature advantage of having better utilization of available spindles, centralized caches and bigger hardware than the equivalent storage spread out among many individual servers.  Like shared CPU resources, when the SAN is not receiving requests from multiple clients it has the ability to dedicate all of its capacity to servicing the requests of a single client providing an average performance experience potentially far higher than what an individual server would be able to affordably achieve on its own.

Using SAN for performance is rapidly fading from favor, however, because of the advent of SSD storage becoming very common.  As SSDs with incredibly low latency and high IOPS performance drop in price to the point where they are being added to stand alone servers as local cache or potentially even being used as mainline storage the bottleneck of the SANs networking becomes a larger and larger factor making it increasingly difficult for the consolidation benefits of a SAN to offset the performance benefits of local SSDs.  SSDs are potentially very disruptive for the shared storage market as they bring the performance advantage back towards local storage – just the latest in the ebb and flow of storage architecture design.

The most important aspect of SAN usage to remember is that SAN should not be a default starting point in storage planning.  It is one of many technology choices and one that often does not fit the bill as intended or does so but at an unnecessarily high price point either in monetary or complexity terms.  Start by defining business goals and needs.  Select SAN when it solves those needs most effectively, but keep an open mind and consider the overall storage needs of the environment.

Comparing SAN and NAS

One of the greatest confusions that I have seen in recent years is that between NAS and SAN.  Understanding what each is will go a long way towards understanding where they are useful and appropriate.

Our first task is to strip away the marketing terms and move on to technical ones.  NAS stands for Network Attached Storage but doesn’t mean exactly that and SAN stands for Storage Area Network but is generally used to refer to a SAN device, not the network itself.  In its most proper form, a SAN is any network dedicated to storage traffic, but in the real world, that’s not how it is normally used.  In this case we are hear to talk about NAS and SAN devices and how they compare so we will not use the definition that includes the network rather than the device.  In reality, both NAS and SAN are marketing terms and are a bit soft around the edges because of it.  They are precise enough to use in a normal technical conversation, as along as all parties know what they mean, but when discussing their meaning we should strip away the cool-sounding names and stick to the most technical descriptions.  Both terms, when used via marketing, are used to imply that they are a certain technology that has been “appliancized” which makes the use of the terms unnecessarily complicated but no more useful.

So our first task is to define what these two names mean in a device context.  Both devices are storage servers, plain and simple, just two different ways of exposing that storage to the outside world.

The simpler of the two is the SAN which is properly a block storage device.  Any device that exposes its storage externally as a block device falls into this category and can be used interchangeably based on how it is used.  The block storage devices are external hard drives, DAS (Direct Attach Storage) and SAN.  All of these are actually the same thing.  We call it an external hard drive when we attach it to a desktop.  We call it a DAS when we attach it to a server.  We call it a SAN when we add some form of networking, generally a switch, between the device and the final device that is consuming the storage.  There is no technological difference between these devices.  A traditional SAN can be directly attached to a desktop and used like an external hard drive.  An external hard drive can be hooked to a switch and used by multiple devices on a network.  The interface between the storage device and the system using it is the block.  Common protocols for block storage include iSCSI, Fibre Channel, SAS, eSATA, USB, Thunderbolt, IEEE1394 (aka Firewire), Fibre Channel over Ethernet (FCoE) and ATA over Ethernet (AoE.)  A device attaching to a block storage device will always see the storage presented as a disk drive, nothing more.

A NAS, also known as a “filer”, is a file storage device.  This means that it exposes its storage as a network filesystem.  So any device attaching to this storage does not see a disk drive but instead sees a mountable filesystem.  When a NAS is not packaged as an appliance, we simply call it a file server and nearly all computing devices from desktops to servers have some degree of this functionality included in them.  Common protocols for file storage devices include NFS, SMB / CIFS and AFP.  There are many others, however, and technically there are special case file storage protocols such as FTP and HTTP that should qualify as well.  As an extreme example, a traditional web server is a very specialized form of file storage device.

What separates block storage and file storage devices is the type of interface that they present to the outside world, or to think of it another way, where the division between server device and client device happens within the storage stack.

It has become extremely common today for storage devices to include both block storage and file storage from the same device.  Systems that do this are called unified storage.  With unified storage, whether you can say that it is behaving as block storage or file storage device (SAN or NAS in the common parlance) or both is based upon the behavior that you configure for the device not based on what you purchase.  This is important as it drives home the point that this is purely a protocol or interface distinction, not one of size, capability, reliability, performance, features, etc.

Both types of devices have the option, but not the requirement, of providing extended features beneath the “demarcation point” at which they hand off the storage to the outside.  Both may, or may not, provide RAID, logical volume management, monitoring, etc.  File storage (NAS) may also provide file system features such as Windows NTFS ACLs.

The key advantage to block storage is that the systems that attach to it are given an opportunity to manipulate the storage system as if it were a traditional disk drive.  This means that RAID and logical volume management, which may already have been doing in the “black box” of the storage device can now be done again, if desired, at a higher level.  The client devices are not aware what kind of device they are seeing, only that it appears as a disk drive.  So you can choose to trust it (assume that it has RAID of an adequate level, for example) or you can combine multiple block storage devices together into RAID just as if they were regular, local disks.  This is extremely uncommon but is an interesting option and there are products that are designed to be used in this way.

More commonly, logical volume management such as Linux LVM, Solaris ZFS or Windows Dynamic Disks is applied on top of the exposed block storage from the device and then, on top of that, a filesystem would be employed.  This is important to remember, with block storage devices the filesystem is created and managed by the client device, not by the storage device.  The storage device is blissfully unaware of how the block storage that it is presenting is used and allows the end user to use it however they see fit with total control.  This extends even to the point that you can chain block storage devices together with one providing the storage to the next being combines, perhaps, into RAID groups – block storage devices can be layered, more or less, indefinitely.

Alternatively, a file storage device contains all of the block portion of the storage so any opportunity for RAID, logical volume management and monitoring must be handled by the file storage device.  Then, on top of the block storage, a filesystem is applied.  Commonly this would be Linux’ EXT4, FreeBSD and Solaris’ ZFS, Windows NTFS but other filesystems such as WAFL, XFS, JFS, BtrFS, UFS and more are certainly possible.   On this filesystem, data will be stored.  To them share this data with the outside world a network file system (also known as a distributed file system) is used which provides a file system interface that is network enabled – NFS, SMB and AFP being the most common but, like in any protocol family, there are numerous special case and exotic possibilities.

A remote device wanting to use storage on the file storage device would see it over the network the same as it would see a local filesystem and is able to mount it in an identical manner.  This makes file storage especially easy and obvious for end consumer to use as it is very natural in every aspect.  We use network file systems every day for normal desktop computing.  When we “map a drive” in Windows, for example, we are using a network file system.

One critical differentiation between block storage and file storage that must be differentiated between is that, while both potentially can sit on a network and allow multiple client machines to attach to them, only file storage devices have the ability arbitrate that access.  This is very important and cannot be glossed over.

Block storage appears as a disk drive.  If you simply attach a disk drive to two or more computers at once, you can imagine what will happen – each will know nothing of the other and will be unaware of new files being created, others changing and they systems will rapidly begin to overwrite each other.  If your file system is read only on all nodes, this is not a problem.  But if any system is writing or changing the data, the others will have problems.  This generally results in data corruption very quickly, typically on the order of minutes.  To see this in extreme action, imagine having two or three client system all believe that they have exclusive access to a disk drive and have them all defragment it at the same time.  All data on the drive will be scrambled in seconds.

A file storage device, on the other hand, has natural arbitration as the network file system handles the communications for access to the real file system and filesystems, by their nature, are multi-user naturally.  So if one system attached to a file storage device makes a change, all systems are immediately aware of the change and will not “step on each others toes.”  Even if they attempt to do the the file storage device’s filesystem arbitrates access and has the final say and does not let this happen.  This makes sharing data easy and transparent to end users.  (I use the term “end users” here to include system administrators.)

This does not mean that there is no means of sharing storage from a block device, but the arbitration of it cannot be handled by the block storage device itself.  Block storage devices are be made “shareable” by using what is known as a clustered file system.  These types of file systems originated back when server clusters shared storage resources by having two servers attached with a SCSI controller on either end of a single SCSI cable and having the shared drives attached in the middle of the cable.  The only means by which the servers could communicate was through the file system itself and so special clustered file systems were developed that allowed there to be communications between the devices, alerting each to changes made by the other, through the file system itself.  This actually works surprisingly well but clustered file systems are relatively uncommon with Red Hat’s GFS and Oracle’s OCFS being some of the best well known in the traditional server world and VMWare’s much newer VMFS having become extremely well known through its use for virtualization storage.  Normal users, including system administrators, may not have access to clustered file systems or may have needs that do not allow their use.  Of important note is also that the arbitration is handled through trust, not through enforcement, like with a file storage device.  With a file storage device, the device itself handles the access arbitration and there is no way around it.  With block storage devices using a clustered file system, any device that attaches to the storage can ignore the clustered file system and simply bypass the passive arbitration – this is so simple that it would normally happen accidentally.  It can happen when mounting the filesystem and specifying the wrong file system type or through a drive misbehaving or any malicious action.  So access security is critical at the network level to protect block level storage.

The underlying concept being exposed here is that block storage devices are dumb devices (think glorified disk drive) and file storage devices are smart devices (think traditional server.)  File storage devices must contain a full working “computer” with CPU, memory, storage, filesystem and networking.  Block storage devices may contain these things but need not.  At their simplest, block storage devices can be nothing more than a disk drive with a USB or Ethernet adapter attached to them.  It is actually not uncommon for them to be nothing more than a RAID controller plus Ethernet or Fiber Channel adapters to be attached.

In both cases, block storage device and file storage devices, we can scale down to trivially simple devices or can scale up to massive “mainframe class” ultra-high-availability systems.  Both can be either fast or slow.  One is not better or worse, one is not higher or lower, one is not more or less enterprise – they are different and serve generally different purposes.  And there are advanced features that either may or may not contain.  The challenge comes in knowing which is right for which job.

I like to think of block storage protocols as being a “standard out” stream, much like on a command line.  So the base level of any storage “pipeline” is always a block device and numerous block devices or transformations can exist with each being piped one to another as long as the output remains a block storage protocol.  We only terminate the chain when we apply a file system.   In this way hardware RAID, network RAID, logical volume management, etc. can be applied in multiple combinations as needed.  Block storage is truly not just blocks of data but building blocks of storage systems.

One point that is very interesting is that since block storage devices can be chained and since network storage devices must accept block storage as their “input” it is actually quite common for a block storage device (SAN) to be used as the backing storage for a file storage device (NAS), especially in high end systems.  They can coexist within a single chassis or they can work cooperatively on the network.

Choosing a Storage Type

While technicalities defining which type of storage is which can become problematic, the underlying concepts are pretty well understood.  There are four key types of storage that we use in everyday server computing: local disks, DAS, NAS and SAN.  Choosing which we want to use, in most cases, can be broken down into a relatively easy formula.

The quick rule of thumb for storage should be: Local before DAS, DAS before NAS, NAS before SAN.  Or as I like to write it:

Local Disks -> DAS -> NAS -> SAN

To use this rule you simply start with your storage requirements in hand and begin on the left hand side.  If local disks meet your requirements, then almost certainly they are your best choice.  If they don’t meet your requirements move to the right and check if DAS will meet your requirements.  If so, great, if not continue the process.

That’s the rule of thumb, so if that is all you need, there you go.  But we will dive into the “why” of the rule below. The quick overview is that on the left we get speed and reliability at the lowest cost.  As we move to the right complexity increases as does price typically.  The last two, while very different, are actually the most alike in many ways due to their networked nature.

Local Disks:  Local drives inside your server chassis are your best bet for most tasks.  Being inside the chassis means the least amount of money spent on extra containers to hold and power the drives, least physical risk, most solid connection technologies, shortest distance and least amount of potential bottlenecks. Being raw disks, local disks are block devices.

Direct Attached Storage:  DAS is, more or less, local drives housed outside of the server chassis.  The server itself will see them exactly like any other local drives making them very easy to use.  DAS is simple but still has extra external containers and extra cables.  This adds cost and some complexity.  DAS makes it easier to attach multiple servers to the same set of drives as this is almost impossible, and always cumbersome, with local disks.  So DAS is effectively our first type of physically sharable storage.  Being identical to local disks, DAS is a form of block device.

Network Attached Storage: NAS is unique in that it is the only non-block device from which we have to choose.  A NAS, or a traditional file server – they are truly one and the same, is the first of our technologies designed to run over a network.  This adds a lot of complication.  NAS shares storage out at the filesystem level.  A NAS is an intelligent device that allows users over the network to easily and safely share storage because the NAS has the necessary logic on board to handle multiple users at one time.  NAS is very easy for anyone to use and is even commonly used by people at home.

Storage Area Network: SAN is an adaptation of DAS with the addition of a network infrastructure allowing the SAN to behave as a remote hard drive (block device) that an operating system sees as no different from any other hard drive attached to it.  SANs require advanced networking knowledge, are surrounded by a large amount of myth and rumor, are poorly understood by the average IT professional, are generally complex to use and understand and because they lack the logic of a NAS they effectively expose a hard drive directly to the network making it trivially easy to corrupt and destroy data.  It is, in fact, so easy to lose data on a SAN due to misconfiguration that the most commonly expected use of a SAN is a use case for which a SAN cannot be used.

Of course there is much grey area.  What is normally considered a DAS can be turned into a SAN.  A SAN can be direct connected.  NAS can be direct connected.  Local storage can act as either NAS or SAN depending on configuration such as with a VSA (Virtual Storage Appliance.)  Many devices are simultaneously NAS and SAN and the determination is by configuration, not by the physical device itself.  But in generally accepted use, the terms are mostly straightforward.

The point being that as we move from left to right in our list we move from simple and easy to difficult and complex.  SAN itself is a rock solid technology; it is the introduction of humans and their tendency to do dangerous things easily with SAN that makes it a dangerous storage technique for the average user.  As with everything in IT, keeping our technologies and processes simple brings stability and security and, often, cost savings as well.

There are many times when movement to “the right” is necessary.  Local disks do not scale well and can become too expensive to maintain for certain types of larger deployments.  DAS, likewise, doesn’t scale well in many cases.  NAS scales well but being a non-block protocol is a bit unique and doesn’t always work for our purposes, a good example being HyperV that requires a block device for storage.  SAN is the final catchall of storage.  If nothing else works, SAN is always there to fall back on – or, as I like to say, SAN is the storage of last resort.

This is a very high level look at the basics of choosing a storage approach.  This is a common IT task that must be done with great regularity.  I did not intend this post, in any way, to explain any deep knowledge of storage but simply to provide a handy guide to understanding where to start looking at storage options.  Exceptions and special cases abound, but it is extremely common to simply skip the best option and go straight to considering something big, expensive and complex and rapidly forget that something much more simple might do the same job in a far superior manner.  The underlying concept is the simplest solution that meets the need is usually the best.

When No Redundancy Is More Reliable – The Myth of Redundancy

Risk in a difficult concept and it requires a lot of training, thought and analysis to properly assess given scenarios.  Often, because risk assessments are so difficult, we substitute risk analysis with simply adding basic redundancy and assuming that we have appropriately mitigated risk.  But very often this is not the case.  The introduction of complexity or additional failure modes often accompany the addition of redundancy and these new forms of failure have the potential to add more risk than the added redundancy removes.  Storage systems are especially prone to these decision processes which is unfortunate as few, if any, systems are so susceptible to failure and more important to protect.

RAID is a great example of where a lack of holistic risk thinking can lead to some strange decision making.  If we look at a not uncommon scenario we will see where the goal of protecting against drive failure can actually lead to an increase in risk even when additional redundancy is applied.  In this scenario we will compare a twelve drive array consisting of twelve three terabyte SATA hard drives in a single array.  It is not uncommon to hear of people choosing RAID 5 for this scenario to get “maximum capacity and performance” while having “adequate protection against failure.”

The idea here is that RAID 5 protects against the loss of a single drive which can be replaced and the array will rebuild itself before a second drive fails.  That is great in theory, but the real risks of an array of this size, thirty six terabytes of drive capacity, come not from multiple drive failures as people generally suspect but from an inability to reliably rebuild the array after a single drive failure or from a failure of the array itself with no individual drives failing.  The risk of a second drive failing is low, not non-existent, but quite low.  Drives today are highly reliable. Once one drives fails it does increase the likelihood of a second drive failing, which is well documented, but I don’t want this risk to mislead us from looking at the true risks – the risk of a failed resilvering operation.

What happens that scares us during a RAID 5 resilver operation is that an unrecoverable read error (URE) can occur.  When it does the resilver operation halts and the array is left in a useless state – all data on the array is lost.  On common SATA drives the rate of URE is 10^14, or once every twelve terabytes of read operations.  That means that a six terabyte array being resilvered has a roughly fifty percent chance of hitting a URE and failing.  Fifty percent chance of failure is insanely high.  Imagine if your car had a fifty percent chance of the wheels falling off every time that you drove it.  So with a small (by today’s standards) six terabyte RAID 5 array using 10^14 URE SATA drives, if we were to lose a single drive, we have only a fifty percent chance that the array will recover assuming the drive is replaced immediately.  That doesn’t include the risk of a second drive failing, only the risk of a URE failure.  It also assumes that the drive is completely idle other than the resilver operation.  If the drives are busily being used for other tasks at the same time then the chances of something bad happening, either a URE or a second drive failure, begin to increase dramatically.

With a twelve terabyte array the chances of complete data loss during a resilver operation begin to approach one hundred percent – meaning that RAID 5 has no functionality whatsoever in that case.  There is always a chance of survival, but it is very low.  At six terabytes you can compare a resilver operation to a game of Russian roulette with one bullet and six chambers and you have to pull the trigger three times.  With twelve terabytes you have to pull it six times!  Those are not good odds.

But we are not talking about a twelve terabyte array.  We are talking about a thirty six terabyte array – which sounds large but this is a size that someone could easily have at home today, let alone in a business.  Every major server manufacturer, as well as nearly all low cost storage vendors, make sub $10,000 storage systems in this capacity range today.  Resilvering a RAID 5 array with a single drive failure on a thirty six terabyte array is like playing Russian roulette, one bullet, six chambers and pulling the trigger eighteen times!  Your data doesn’t stand much of a chance.  Add to that the incredible amount of time needed to resilver an array of that size and the risk of a second disk failing during that resilver window starts to become a rather significant threat.  I’ve seen estimates of resilver times climbing into weeks or months on some systems.  That is a long time to run without being able to lose another drive.  When we are talking hours or days the risks are pretty low, but still present.  When we are talking weeks or months of continuous abuse, as resilver operations are extremely drive intensive, the failure rates climb dramatically.

With an array of this size we can effectively assume that the loss of a single drive means the loss of the complete array leaving us with no drive failure protection at all.  Now if we look at a drive of the same or better performance with the same or better capacity under RAID 0, which also has no protection against drive loss, we need only use eleven of the same drives that we needed twelve of for our RAID 5 array.  What this means is that instead of twelve hard drives, each of which has a roughly three percent chance of annual failure, we have only eleven.  That alone makes our RAID 0 array more reliable as there are fewer drives to fail.  Not only do we have fewer drives but there is no need to write the parity block nor skip parity blocks when reading back lowering, ever so slightly, the mechanical wear and tear on the RAID 0 array for the same utilization giving it a very slight additional reliability edge.  The RAID 0 array of eleven drives will be identical in capacity to the twelve drive RAID 5 array but will have slightly better throughput and latency.  A win all around.  Plus the cost savings of not needing an additional drive.

So what we see here is that in large arrays (large in capacity, not in spindle count) that RAID 0 actually passes RAID 5 in certain scenarios.  When using common SATA drives this happens at capacities experienced even by power users at home and by many small businesses.  If we move to enterprise SATA drives or SAS drives then the capacity number where this occurs becomes very high and is not a concern today but will be in just a few years when drive capacities get larger still.  But this highlights how dangerous RAID 5 is in the sizes that we see today.  Everyone understands the incredible risks of RAID 0 but it can be difficult to put into perspective that RAID 5’s issues are so extreme that it might actually be less reliable than RAID 0.

That RAID 5 might be less reliable than RAID 0 in an array of this size based on resilver operations alone is just the beginning.  In a massive array like this the resilver time can take so long and exact such a toll on the drives that second drive failure starts to become a measurable risk as well.  And then there are additional risks caused by array controller errors that can utilize resilver algorithms to destroy an entire array even when no drive failure has occurred.  As RAID 0 (or RAID 1 or RAID 10) do not have resilver algorithms they do not suffer this additional risk.  These are hard risks to quantify but what is important is that they are additional risks that accumulate when using a more complex system when a simpler system, without the redundancy, was more reliable from the outset.

Now that we have established that RAID 5 can be less reliable than RAID 0 I will point out the obvious dangers of RAID 0.  RAID in general is used to mitigate the risk of a single, lone hard drive failing.  We all fear a single drive simply failing and all data being lost.  RAID 0, being a large stripe of drives without any form of redundancy, takes the risk of data loss of a single drive failing and multiplies it across a number of drives where any drive failing causes total loss of data to all drives.  So in our eleven disk example above, if any of the eleven disks fails all is lost.  It is clear to see where this is dramatically more dangerous than just using a single drive, all alone.

What I am trying to point out here is that redundancy does not mean reliability.  Just because something is redundant, like RAID 5, provides no guarantee that it will always be more reliable than something that is not redundant.

My favourite analogy here is to look at houses in a tornado.  In one scenario we build a house of brick and mortar.  On the second scenario we build two redundant house, east out of straw (our builders are pigs, apparently.)  When the tornado (or big bad wolf) comes along which is more likely to leave us with a standing house?  Clearing one brick and mortar house has some significant reliability advantages over redundant straw houses.  Redundancy didn’t matter, reliability mattered in the end.

Redundancy is often misleading because it is easy to quantify but hard to qualify.  Redundancy is a black or white question: Is it redundant?  Yes or no.  Simple.  Reliability is not so simple.  Reliability is about failure rates and likelihoods.  It is about statistics and analysis.  As it is hard to quantify reliability in a meaningful way, especially when selling a project to the business people, redundancy often becomes a simple substitute for this complex concept.

The concept of using redundancy to misdirect questions of reliability also ends up applying to subsystems in very convoluted ways.  Instead of making a “system” redundant it has become common to make a highly reliable, and low cost, subsystem redundant and treat subsystem redundancy as applying to the whole system.  The most common example of this is RAID controllers in SAN products.  Rather than having a redundant SAN (meaning two SANs) manufacturers will often make that one component not often redundant in normal servers redundant  and then calling the SAN redundant – meaning a SAN that contains redundancy, which is not at all the same thing.

A good analogy here would be to compare having redundant cars meaning two complete, working cars and having a single car with a spare water pump in the trunk in case the main one fails.  Clearly, a spare water pump is not a bad thing.  But it is also a trivial amount of protection against car failure compared to having a second car ready to go.  In one case the entire system is redundant, including the chassis.  In the other we are making just one, highly reliable component redundant inside the chassis.  It’s not even on par with having a spare tire which, at least, is a car component with a higher likelihood of failure.

Just like the myth of RAID 5 reliability and system/subsystem reliability, shared storage technologies like SANs and NAS often get treated in the same way, especially in regards to virtualization.  There is a common scenario where a virtualization project is undertaken and people instinctively panic because a single virtualization host represents a single point of failure where, if it fails, many systems will all fail at once.

Using the term “single point of failure” causes a panic feeling and is a great means of steering a conversation.  But a SPOF, as we like to call it, while something we like to remove when possible may not be the end of the world.  Think about our brick house.  It is a SPOF.  Our two houses of straw are not.  Yet a single breeze takes out our redundant solutions faster than our reliable SPOF.  Looking for SPOFs is a great way to find points of fragility in a system, but do not feel that every SPOF must be made redundant in every scenario.  Most businesses will find their best value having many SPOFs in place.  Our real goal is reliability at appropriate cost, redundancy, as we have seen, is no substitute for reliability, it is simply a tool that we can use to achieve reliability.

The theory that many people follow when virtualizing is that they take their virtualization host and say “This host is a SPOF, so I need to have two of them and use High Availability features to allow for transparent failover!”  This is spurred by the leading virtualization vendor making their money firstly by selling expensive HA add on products and secondly by being owned by a large storage vendor – so selling unnecessary or even dangerous additional shared storage is a big monetary win for them and could easily be the reason that they have championed the virtualization space from the beginning.  Redundant virtualization hosts with shared storage sounds great but can be extremely misguided for several reasons.

The first reason is that removing the initial SPOF, the virtualization host, is replaced with a new SPOF, the shared storage.  This accomplishes nothing.  Assuming that we are using comparable quality servers and shared storage all we’ve done is move where the risk is, not change how big it is.  The likelihood of the storage system failing is roughly equal to the likelihood of the original server failing.  But in addition to shuffling the SPOF around like in a shell game we’ve also done something far, far worse – we have introduced chained or cascading failure dependencies.

In our original scenario we had a single server.  If the server stayed working we are good, if it failed we were not.  Simple.  Now we have two virtualization hosts, a single storage server (SAN, NAS, whatever) and a network connecting them together.  We have already determined that the risk of the shared storage failing is approximately equal to our total system risk in the original scenario.  But now we have the additional dependencies of the network and the two front end virtualization nodes.  Each of these components is more reliable than the fragile shared storage (anything with mechanical drives is going to be fragile) but that they are lower risk is not the issue, the issue is that the risks are combinatorial.

If any of these three components (storage, network or the front end nodes) fail then everything fails.  The solution to this is to make the shared storage redundant on its own and to make the network redundant on its own.  With enough work we can overcome the fragility and risk that we introduced by adding shared storage but the shared storage on its own is not a form of risk mitigation but is a risk itself which must be mitigated.  The spiral of complexity begins and the cost associated with bringing this new system up on par with the reliability of the original, single server system can be astronomic.

Now that we have all of this redundancy we have one more risk to worry about.  Managing all of this redundancy, all of these moving parts, requires a lot more knowledge, skill and preparation than does managing a simple, single server.  We have moved from a simple solution to a very complex one.  In my own anecdotal experience the real dangers of solutions like this come not from the hardware failing but from human error.  Not only has little been done to avoid human error causing this new system to fail but we’ve added countless points where a human might accidentally bring the entire system, redundancy and all, right down.  I’ve seen it first hand; I’ve heard the horror stories.  The more complex the system the more likely a human is going to accidentally break everything.

It is critical that as IT professionals that we step back and look at complete systems and consider reliability and risk and think of redundancy simply as a tool to use in the pursuit of reliability.  Redundancy itself is not a panacea.  Neither is simplicity.  Reliability is a complex problem to tackle.  Avoiding simplistic replacements is an important first step in moving from covering up reliability issues to facing and solving them.