Comments on: Explaining the Lack of Large Scale Studies in IT https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/ The Information Technology Resource for Small Business Mon, 24 Apr 2017 20:04:52 +0000 hourly 1 https://wordpress.org/?v=6.9.1 By: Scott Alan Miller https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22380 Wed, 15 Apr 2015 16:36:17 +0000 http://www.smbitjournal.com/?p=736#comment-22380 I’m not even sure to what you are referring. You complained that I’ve not published but did not check that I had. I provided the information again.

Today BackBlaze just released some new drive info. It’s handy, but ultimately proves my points – the data is a small data set (only 4500 drives) and comes three years after the drives are out of production. So while it is handy and interesting to know that Seagate had a higher failure rate three years ago, it does not give us usable information to change our buying patterns today. Nor does it show comparitives to other models and makers during that same window.

https://www.backblaze.com/blog/3tb-hard-drive-failure/

]]>
By: LinAdmin https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22377 Tue, 14 Apr 2015 18:36:55 +0000 http://www.smbitjournal.com/?p=736#comment-22377 Nothing but hot air. Bye.

]]>
By: Scott Alan Miller https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22374 Tue, 14 Apr 2015 11:31:34 +0000 http://www.smbitjournal.com/?p=736#comment-22374 In reply to LinAdmin.

I have repeatedly published that data – it is discussed regularly. There is not an official publication from the investment bank where the study was done because banks do not do this, there is no financial reason to publish data for the IT industry by a bank. The reasons for this were covered in the article. Pretty much any company capable of doing such a study has no financial incentive to provide it to others, has legal barriers from doing so or would see the data as a competitive advantage not to be shared with their competition. Even in a case, like mine, where the data is benign and non-competitive, no company will voluntarily take on risk and cost of publishing a study that has no benefit to them.

]]>
By: LinAdmin https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22373 Mon, 13 Apr 2015 14:55:00 +0000 http://www.smbitjournal.com/?p=736#comment-22373 Mr. Scott, in case you really do have trustworthy data about 60K drives over 8 years, why did you not (yet?) publish it?

I would be too glad to tell you what conclusions can be drawn from it.

]]>
By: Scott Alan Miller https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22362 Fri, 10 Apr 2015 10:59:25 +0000 http://www.smbitjournal.com/?p=736#comment-22362 In reply to LinAdmin.

The total number of drives is not the factor, it is the total number of different types of drives. 40,000 is not a large number when we are talking about a statistical sample set. If that were testing just four models, that’s an average of 10K drives of each. The numbers get much smaller very quickly. But the distribution is nowhere near even, a few models are in huge quantity and others are in very small. The total sample size is obviously misleading because 40K sounds good until you look at the distribution and individual samples.

Doing drive and array reliability requires very large numbers. For doing my own statistics I used over 32,000 drives at a time over 8 years, more than 60,000 drives total and I can assure you it was not statistically relevant at all with the rarity of drive and array failures. I could produce a few meaningful statistics about a single factor but did not have a sample size large enough to produce comparative numbers at all.

So while you draw the conclusion that a number like 40K is simply enough to be meaningful based on a raw number, I see 40K as unable to produce enough data to compare any meaningful number of options while overcoming background noise from the sample process. Stastically, when looking at very low failure rates and needing to not produce a single number but needing to compare many different models, types and options, 40K is far, far too small to tell us anything more than some very general numbers that we can only trust so far because the data is primarily about one or two specific models of drives and not about drives in general.

]]>
By: LinAdmin https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22358 Thu, 09 Apr 2015 20:12:35 +0000 http://www.smbitjournal.com/?p=736#comment-22358 The last reliability report of Blackblaze is based on more than 40’000 drives and I do not have the slightest doubts that with these numbers the results are reliable enough to draw clear conclusions.

So I do no longer expect you to prove why their sample sizes should be too low.

]]>
By: Scott Alan Miller https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22353 Wed, 08 Apr 2015 09:56:03 +0000 http://www.smbitjournal.com/?p=736#comment-22353 I agree, BackBlaze gives us good hints and I am very thankful to them for supplying the data that they do. But their samples sizes are very low and contain only a handful of different drives. Even BB is not large enough to, nor do they have a good interest in, having many, many different types of drives to produce broad comparisons. They only use a few different drives and some of those are in tiny quantities. To get good numbers we would need at least as many drives as they have for their largest pools but of several different drive models and vendors. BB has the best data on the market, but it is poor as statistical data goes. That’s my point, even the best isn’t very good when it comes to large IT studies.

I’ve used Green drives too with great results, but my pool is tiny and not statistically significant. Are you using them standalone or in storage pools (like RAID?) If the latter, look to WD Red drives instead. They are WD Green drives with modified firmware to be more reliable in that setting.

]]>
By: LinAdmin https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22349 Fri, 03 Apr 2015 14:25:43 +0000 http://www.smbitjournal.com/?p=736#comment-22349 @Scott Alan Miller April 2:

Could you please explain how you come to that conclusion?

I felt safe enough to decide my recent (small scale) buying decisions based on the published Blackblaze data.

My own experience had shown unsatisfactory reliability of WD green drives, but due to a much too small number that data is not representative.

For all those who do not have data from an installed base of a few thousand drives, imho Blackblaze gives valuable hints.

]]>
By: Scott Alan Miller https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22348 Thu, 02 Apr 2015 13:31:13 +0000 http://www.smbitjournal.com/?p=736#comment-22348 And even BackBlaze doesn’t have enough data to be truly useful. They have limited failure information on a very small subset of drives. The data is the best that we have, in most cases, but it is far less than you would want for a meaningful ability to compare between vendors, generations, and even products within a vendor.

]]>
By: LinAdmin https://smbitjournal.com/2015/03/explaining-the-lack-of-large-scale-studies-in-it/comment-page-1/#comment-22347 Thu, 02 Apr 2015 12:11:33 +0000 http://www.smbitjournal.com/?p=736#comment-22347 >”lack of incentive to produce and/or share this data with other companies.”

No doubt that every sufficiently big IT company nowdays collects this kind of data in order to minimize cost and/or risk.

But Blackblaze still is the only company to publish their findings in detail, because they believe that this kind of publicity gives them an advantage over their competition.

Not even Google published data about the manufacturers and models of their failure analysis!

And the big rest of companies is convinced that this kind of valuable know-how best suits their needs if it is not available to their competitors.

This is the short truth, no longer text is needed.

]]>