Comments on: Hot Spare or a Hot Mess

By: Ammaross Danan

Ammaross Danan — Fri, 15 Sep 2017 17:36:33 +0000

Not to necro the article or anything but things to consider that were not mentioned in the article:

1) In RAID10, if a bitrot incident occurs during the rebuild of a bad disk, you get silent corruption. If a URE or bad block occurs, you either get corruption or a failed array. This is less likely since it’s a 1:1 copy instead of many-to-1, but something to be aware of.
2) RAID5/6 will detect both of those cases and treat it as a corrupt block. RAID6 can correct it with the double parity (quorum consensus) whereas RAID5 doesn’t know if the stripe data or the stripe parity is correct; just that one of them is wrong. Usually this throws an error or silently replaces the parity data with updated (possibly bad) parity information.
3) These considerations are precisely why a resilient filesystem on top of RAID is beneficial. ZFS, BtrFS, etc will do block-level checksums on data to detect these types of pass-thru errors and correct them. Since they’re software RAID filesystems, they can use their many error-correcting algorithms in tandem, so it would know precisely which block on which disk had a URE, bitrot, bad block, etc and correct it appropriately. ZFS RAID5/6 also has the benefit of not immediately ejecting a failing (not failed! failing) drive that has bad blocks, UREs, etc so it can 1:1 copy the drive during resilvering, thus keeping the exact benefit Scott Alan Miller mentions that RAID10 has in regards to resilver UREs and stress. However, ZFS is an enterprise software RAID and filesystem combined and is far more advanced than simple hardware RAID 99% of this article is referring to, and sadly, what most internet rule-of-thumb guidance refers to.
4) That said, RAID10 is performance. RAID6 is archival. This doesn’t mean RAID6 has bad performance, just slower (sometimes by far in cases) than RAID10. In an enterprise SAN, RAID5 isn’t uncommon, but there’s many higher-level things going on that make it safe(r) to do than a DAS card from Dell or HP, so you can’t use that as an example.
5) Warm Spares are useful for SANs, where you have 100+ drives spread across many RAID arrays and you have one or two warm spares to jump in and rebuild until the NOC techs can swap it out. This is primarily a cost-saving measure due to shelf realestate and if the SAN was engineered well enough, is resilient to even losing an entire shelf of disks, so “offlining a RAID5 due to URE” isn’t an issue either. Most enterprise SANs use software RAID anyway, similar to things like ZFS, so it’s difficult to even draw exact RAID comparisons in the first place.

By: Raffles

Raffles — Wed, 06 Sep 2017 13:47:24 +0000

Thanks for the article and the explanations. I really like your ZIP file analogy for RAID 5/6 – that’s exactly it isn’t it? The redundancy is obtained by using error correction, which relies on EVERY other bit of the information being there, so that it can calculate what is missing. It’s amazing how it manages to get dual redundancy (or triple redundancy for RAID 6) without doubling (or tripling) the amount of disk space it uses. Almost magic having an entire second copy of the disk in a fraction of the space, but a bit fragile. As it turns out, we have gone RAID 6. Why? Well firstly, sometimes you just have some kit to work with, and a spec to meet. We need to get a certain amount of actual storage from a 20 disk storage array, and we also need to provide a certain level of redundancy. RAID 10 wouldn’t leave us with enough storage unfortunately. We have no warm spare for the reasons you mentioned – we would rather replace a failed disk manually at a time of our choosing. Instead we have gone for a dynamic disk pool solution, which means we can use all the disks we have available to maximise both our storage capacity and also redundancy (the DDP rebalancing of the RAID 6 means we can have several disks failing, not just 2, provided they don’t all go down simultaneously).

By: mysteryDave

mysteryDave — Wed, 19 Jul 2017 13:24:47 +0000

One of the better reasoned and written articles I have seen on here. Excellent.

By: Craig Jacobs

Craig Jacobs — Sat, 15 Jul 2017 22:53:17 +0000

I was initially skeptical of the article becasue of the title, but I found your logic to be unassailable.

I almost always choose RAID 10 + HS (WS) becasue the NAS units I use have 5 bays, and disks are cheap, and the empty bay irritates my OCD nature. RAID 10 is far faster and I’ve noticed, although this may be my imagination, that the disk chatter noise from the units is less than with RAID 6.

By: Heber Corrales

Heber Corrales — Tue, 21 Jan 2014 12:43:34 +0000

Good article

By: Thibaut

Thibaut — Wed, 15 Jan 2014 13:07:05 +0000

Hi Scott,

I’ve run recently into the famous URE problem with RAID 5 while I was confidently upgrading my Netgear ReadyNAS following their procedure. After a long bit of reading (mostly from your blog and spiceworks), I really better understand those problems now (Many thanks !), but I still have some questions:

You (among others) say that RAID 6 will encounter the same problem as RAID 5 in the next 5 years but I saw nowhere a precise calculation or description of when the RAID 6 really becomes risky ?

To solve my problem, I bought a new 6 drives NAS with 6 x 4 TB SATA (classic 10^14 drives). I know that it’s not the best that can be done but it’s a home NAS, I’m not Bill Gates and I need to have over 10 TB of useable capacity, so this was the best price/security ratio. I’m now in the trouble of having to choose between RAID 10 and RAID 6.

RAID 10 is clearly the best choice concerning security because it’s not affected by the Parity RAIDs URE disaster but I could really use the extra 4 TB that RAID 6 would offer and since RAID 6 seems to be capable to survive URE while resilvering, I would like to have your thoughts on this. Too risky or not ? What do you think ?

If an URE appears on a disk (in a RAID 6 configuration) while resilvering, will this definitely get this disk out of the rebuild (leaving the resilvering vulnerable to another URE) or will the NAS just get the good data from the other disks, write it on a new sector and go on with the rebuild (leaving the operation vulnerable only to 2 simultaneaous URE or another disk failure + URE on a third one) ?

As you can see, I think that I have well understood the uselessness of RAID 5 nowadays and the power of RAID 10 but I’m in trouble when I need to evaluate RAID 6.

Many thanks in advance and please, excuse my bad english cause it’s not my natural language.

By: Jasper

Jasper — Sat, 04 Jan 2014 10:34:01 +0000

So your big raid 10 is sets of 2 drives mirroring, and then a stripe across these sets, right? And a random drive fails. That means one of your 2 drive mirrors is now a raid1 moirror with only one member, the warm spare gets added to that array, and the controller starts rebuilding that drive from the existing drive.

Now a URE happens. That data is still in only one place at that time. It’s now in 0 places. That should invalidate the raid1 set just as much as a URE encountered during a raid5 rebuild, wouldn’t it? And therefore the stripe across the mirrors and so the entire array lost?

The difference is the *likelihood* of a URE occurring. In a raid5 rebuild, you read *all* the data on *all* the remaining disks. In the case of the mirror rebuild, you read all the data on just *one* disk — and so the risk of a URE is correspondingly less, anywhere from 33% as much to 10% of the risk, depending on the size of the raid5 array.

It’s not zero or even negligible, though. (which is why RAID Is Not A Backup)

By: Scott Alan Miller

Scott Alan Miller — Sun, 08 Dec 2013 20:53:19 +0000

Because parity RAID is a lot like compression in that the result is, in some ways, like a single file. If you have a zip file, for example, and it becomes corrupt, even if it contains many files inside of it, you expect them all to be corrupt and unreadable because the container in which they are held becomes corrupt. Parity RAID acts the same way, only not to make the drive smaller, obviously. The entire array behaves as a single file and if that file is corrupt and unreadable, the system cannot put the pieces back together to reconstruct the file and everything contained within that file, the filesystems on top of the array, is lost.

This isn’t a feeling about how it works or something that comes from me. This is the well known and documented behavior of the RAID 5 specification. It’s not a theoretical problem but a commonly observed one.

Now, why do RAID manufacturers not come up with an algorithm that can protect against that and limit loss? I think that the answer probably comes down to the fact that it is not a practical use of financial resources. How difficult it is to solve and how effective the solution could be, I am not sure.

By: Shaun

Shaun — Sat, 30 Nov 2013 03:34:59 +0000

Hi Scott,

Thanks for taking your time to write articles like this to educate the unwashed masses such as myself. I have read a lot of your content and really appreciate it.

There is one thing still bugging me though. This question was asked before but I think you kind of misread/dodged the question.

Why would a RAID 5’s controller fail an array when encountering a URE during rebuild? Why wouldn’t you just get some corrupted bits, which presumably could be lived with? I don’t see what is causing a single URE to mandate a total array failure.

By: Scott Alan Miller

Scott Alan Miller — Wed, 27 Nov 2013 05:29:41 +0000

Barry, nothing you talk about makes sense. Of course RAID is not backup. That’s a level of misconception not being dealt with here. RAID is about reducing the need to go to backup, about avoiding data loss rather than minimizing it.

RAID 6 is common in archival systems, yes, but that is far from its only place.

By: Scott Alan Miller

Scott Alan Miller — Wed, 27 Nov 2013 05:27:54 +0000

Scrubbing does not fix the problem. URE risk calculations are done with the assumption that the disks were freshly scrubbed. This is a common myth brought out anytime someone wants to discredit URE fears. But it is assumed in the risk mentioned.

Yes, if you don’t scrub then maybe you are in even more risk. But that only makes things worse, not better.

By: Scott Alan Miller

Scott Alan Miller — Wed, 27 Nov 2013 05:26:47 +0000

Google’s papers and Backblazes deal with drive failures, not URE rates. You are mixing concepts together and getting confused.

By: Scott Alan Miller

Scott Alan Miller — Wed, 27 Nov 2013 05:26:00 +0000

RAID 6 is never more reliable than RAID 1, the math makes no sense. Yes, RAID 6 can potentially lose more drives but it always has more to lose plus introduces other risks. RAID 1 has redundancy rather than parity, is much more stable, doesn’t introduce resilver risks and, contrary to popular myth, is not limited to just two drives. You can expand RAID 1 to as many drives as you like. Triple mirrored RAID 1 is ridiculously reliable. But no matter how you slice or dice it, RAID 1 always wins the reliability question.

By: Frederik

Frederik — Thu, 21 Nov 2013 13:27:04 +0000

In setups with many disks you should consider that you get much more usable space from raid 6 rather than raid 10. We have installed several systems with 16 disks, where we get just over 40TB of usable space using 3TB disks, where we would only get 24 if we were running raid 10.

I think this factor is worth mentioning in your article, and perhaps elaborate on how many disks you should add to a one raid set/volume set

By: Barry Kelly

Barry Kelly — Wed, 13 Nov 2013 01:20:14 +0000

RAID6 is more reliable than RAID1 in smaller drive arrays, because you need to lose three drives to lose data, whereas RAID1 only needs to lose two drives – granted, a specific pair, but still only two. You’re pushing it a bit if you go much above 6 drives, though the chances of data loss are still quite low.

You bring up the bogeyman of URE like it’s a trump card for RAID10 (and we can drop the 0, here, striping doesn’t do anything for your reliability). I suspect you’ve taken the “why-raid-6-stops-working-in-2019” article by Robin Harris a bit too seriously; you probably also trust a bit too much in the fairy dust stats that HD manufacturers use when they oversimplify MTTDL. Google’s HDD paper and just recently, Backblaze, with many petabytes and tens of thousands of drives under management, disagree with such simplistic numbers like 1e14 vs 1e15.

Thing is, if you see an error during a rebuild, was the data really there to begin with? You should be scrubbing your RAID arrays regularly so you won’t be surprised during a resilver. The URE panic promulgated by Robin would have you believing you can’t reliably complete a single scrub.

But I have to laugh. This is all a bit of a bogus argument. RAID6 is for archival, to reduce the cost of redundancy, at the expense of performance; if you’re not streaming big contiguous blobs, your life will be miserable. RAID10 is generally used to add reliability and extra read performance to RAID0. Neither counts as backup; you still need an extra copy somewhere else. RAID10 is for when you need uptime with good performance, while RAID6 is more like a reliable and reasonably cheap cache of your offsite backup. They don’t actually compete much with one another, as they serve different purposes.

By: Damon

Damon — Sat, 09 Nov 2013 03:05:07 +0000

This article makes perfect sense. To take things a step further, we utilize the additional reliability of mirrored arrays over parity array by using consumer grade drives along with the flexibility of MDADM. We use a 3 drive, 3 copy MD RAID 10 with inexpensive new drives or used drives depending on what is available for purchase. This is all in addition to a backup scheme.

We feel that even with consumer grade drives, it is highly unlikely that we would have two drives fail at once given they are in a mirrored array and rebuilds do not require the parity calculation thereby reducing the wear on the array during rebuild and reducing rebuild times significantly thereby reducing the second failure window. A 1 TB array takes 4-6 to hours to rebuild even under light load for our few in office users. This means that typically the time from failure to a fully rebuilt array will almost always be less than 24 hours if we are slow to replace the drive, and could be less than a work day if we are on top of it leaving a very small window for a second failure.

But the drives we buy easily cost 40%-70% less than enterprise drives, but do not necessarily offer 40%-70% less life. All of this is afforded by using mirrored vs parity arrays. From our understanding, a RAID 5 rebuild on aged consumer drives would probably leave us without fingernails, and possibly data on a regular basis where with the mirrored arrays, this is not the case. Throw in URE problems and we feel much better about our consumer hardware choices.

The big vision is that the cost for one server is low enough, that we plan on adding a second one and mirroring them using DRBD and associated services giving us an HA setup for around $1000 or less to start, with room for future drives for expansion of the arrays. Then we could have a whole server go down and have a window of time to get it back up, array and all. Again, to use lower priced consumer grade hardware, we need to keep things as simple as possible. Parity is not as simple as mirroring. We also keep our drives in pass through mode so no hardware compatibility in the way in case of controller failure.

With article like these, we gravitate towards things like mirroring and feel we are better off because of it. Thanks for writing it up!

By: Hot Spare mi ? ..na koyim hot spare’in | Mesut'un Blogu

Hot Spare mi ? ..na koyim hot spare’in | Mesut'un Blogu — Tue, 01 Oct 2013 09:39:41 +0000

[…] Nerden ö�?rendim ? Buyrun Buradan […]

By: Scott Alan Miller

Scott Alan Miller — Wed, 11 Sep 2013 18:27:46 +0000

@Michal Lei – the number of times that the data is stored isn’t what creates reliability. It is a factor, to be sure, but it is far from the end all of reliability. If drives failed equally and there were no other reliability factors, then yes, RAID 6 would beat RAID 10.

But RAID 10 has several advantages over RAID 6. It doesn’t suffer URE resilver failure which makes RAID 6’s extra redundancy potentially useless (or at best, less useful than assumed.) RAID 10 rebuilds in a fraction of the time that RAID 6 does (reducing the window for further drive failure.) RAID 10 is less likely, due to wear leveling, to experience the first drive failure. It is less likely as well to experience a second one. And unlike parity RAID, RAID 10 does not exhibit the “drive killing” behaviour that parity RAID does before resilvering is complete.

And the big one, RAID 10, after having lost 1 drive is only vulnerable to losing a single additional drive – the matched pair of the one that has failed. RAID 6 having lost a single drive is at risk of any additional drive failing anywhere in the array at which point it will continue to function but is completely exposed to UREs so may fail to resilver even if no further drives fail.

Taken together, RAID 10, especially as arrays grow in size, keeps reliability pace far in front of RAID 6. The bigger the array, the more reliable RAID 10 becomes in relation to RAID 6. The failure domain size never grows whereas with RAID 6 it just gets bigger and bigger.

Parity RAID “redundancy level” is extremely misleading because it is not mirrored redundancy but is often enumerated as if it were. RAID 6 does not have three copies of each block, not even two copies. But we treat it as having triple redundancy, but that would be a three way mirror. Parity Redundancy is not the same ‘class’ of redundancy as mirroring and has to be looked at in a different light. The chance that a RAID 6 array could actually survive the loss of two drives is relatively low, much lower than the chances of RAID 10 surviving the same thing. (This may be reversed in very small arrays with very high reliability URE drives.)

By: Which is better: RAID5 + 1 Hotspare / RAID6? - Just just easy answers

Which is better: RAID5 + 1 Hotspare / RAID6? - Just just easy answers — Fri, 06 Sep 2013 16:29:28 +0000

[…] http://www.smbitjournal.com/2012/07/hot-spare-or-a-hot-mess/ […]

By: Michal Lei

Michal Lei — Fri, 30 Aug 2013 16:14:31 +0000

How can you say “RAID 10 is ….far more reliable than a RAID 6 array”? In RAID10 the same data are stored only twice. So two disk failure could cause data loss if “right” pair of disks fails. Is there something I am missing?

By: Scott Alan Miller

Scott Alan Miller — Tue, 06 Aug 2013 00:01:49 +0000

Thanks. If you search this site for “RAID”, you will find a large number of related articles covering many facets of RAID.

By: Alain Lecours

Alain Lecours — Sat, 03 Aug 2013 14:15:04 +0000

Mr. Miller
You have wrote a very good article. I am not a IT guy. But it allow me to understand, for the first time, the real cost / benefit of the different raid scenario. When come time to consult the IT consultant, we are in better position to make the proper decision.

By: Scott Alan Miller

Scott Alan Miller — Thu, 01 Nov 2012 20:59:19 +0000

RAID 5 has a 4x write penalty and RAID 6 has a 6x penalty (so 50% greater, yes.) But neither RAID 5 or RAID 6 is chosen for performance. RAID 0, 1 and 10 are the performance choices (depending on needs.) Basically any scenario where a new RAID 5/6 array is being put in can be made as fast while costing less and/or being more reliable, depending on need, using a non-parity option.

In read-only systems there would be exceptions to this, but in a situation where writes are trivial the performance advantage would heavily fall to RAID 6 over RAID 5 + Hot Spare due to the extra spindle.

By: Dave

Dave — Thu, 01 Nov 2012 20:39:31 +0000

All – good and I know this topic wasn’t set to address performance – but from my understanding – RAID6 takes a 50% write performance vs. RAID5 due to writing the parity twice. So if performance is a concern as well as budget – then wouldn’t RAID5 with a cold spare be a good option knowing that the URE for enterprise SAS drives is significantly higher than SATA drives?

By: Scott Alan Miller

Scott Alan Miller — Thu, 26 Jul 2012 16:58:04 +0000

Warm spares shared with multiple arrays share the same problems except that they are more cost effective. If they automate array destruction, they are bad. But if shared between several RAID 10 arrays, for example, they are excellent.

By: Scott Alan Miller

Scott Alan Miller — Thu, 26 Jul 2012 16:57:24 +0000

UREs don’t cause healthy arrays to fail because either parity or mirroring contains the same bit elsewhere and can reconstruct the data without issue. It is only in a degraded parity array (where there is no longer any parity) that the URE causes a full array failure and this is because during a resilver operation the array is unstable and when a URE is encountered the parity is unable to reconstruct the stripe and the resilver fails causing the array itself to fail.

UREs, in the real world during a parity resilver, really do cause a complete loss of the array. It’s catastrophic level failure.

Mirrored RAID (RAID 1 or RAID 10, for example) does not do a computation to reconstruct a stripe and so a URE does not cause a resilver to fail. It is specifically parity reconstruction + URE that is the danger.

By: Mark

Mark — Thu, 19 Jul 2012 15:02:19 +0000

I’m a bit confused on the URE issue. I see it brought up a lot and the impression is always given that a single URE instantly equals the loss of access to all data on the entire array.

Is this actually the case?

Does the controller encounter a URE and just disable access to the array completely, right away? It seems like the controller should be able to just fail to read or write that block of data but continue to give access to areas not affected by the URE (similar to a failed sector on a standalone HDD).

Does this vary from controller to controller?

I haven’t actually seen any real-world reports of what happened during/after an URE event.

Also, what are your thoughts on hot/warm spares in situations where a single spare can be allocated to serve multiple arrays?

Thanks for the post.