I have to preface this article, because people often take these things out of context and react strongly to things that were never said, with the disclaimer that I think that BackBlaze does a great job, has brilliant people working for them and has done an awesome job of designing and leveraging technology that is absolutely applicable and appropriate for their needs. Nothing, and I truly mean nothing, in this article is ever to be taken out of context and stated as a negative about BackBlaze. If anything in this article appears or feels to state otherwise, please reread and confirm that such was said and, if so, inform me so that I may correct it. There is no intention of this article to imply, in any way whatsoever, that BackBlaze is not doing what is smart for them, their business and their customers. Now on to the article:
I have found over the past many years that many small and medium business IT professionals have become enamored by what they see as a miracle of low cost, high capacity storage in what is know as the BackBlaze POD design. Essentially the BackBlaze POD is a low cost, high capacity, low performance nearly whitebox storage server built from a custom chassis and consumer parts to make a disposable storage node used in large storage RAIN arrays leveraging erasure encoding. BackBlaze custom designed the original POD, and released its design to the public, for exclusive use in their large scale hosted backup datacenters where the PODs functions as individual nodes within a massive array of nodes with replication between them. Over the years, BackBlaze has updated its POD design as technology has changed and issues have been addressed. But the fundamental use case has remained the same.
I have to compare this to the SAM-SD approach to storage which follows a similar tact but does so using enterprise grade, supported hardware. These differences sometimes come off as trivial, but they are anything but trivial, they are key underpinnings to what makes the different solutions appropriate in different situations. The idea behind the SAM-SD is that storage needs to be reliable and designed from the hardware up to be as reliable as possible and well supported for when things fail. The POD takes the opposite approach making the individual server unreliable and ephemeral in nature and designed to be disposed of rather than repaired at all. The SAM-SD design assumes that the individual server is important, even critical – anything but disposable.
The SAM-SD concept, which is literally nothing more than an approach to building open storage, is designed with the SMB storage market in mind. The BackBlaze POD is designed with an extremely niche, large scale, special case consumer backup market in mind. The SAM-SD is meant to be run by small businesses, even those without internal IT. The POD is designed to be deployed and managed by a full time, dedicated storage engineering team.
Because the BackBlaze POD is designed by experts, for experts in the very largest of storage situations it can be confusing and easily misunderstood by non-storage experts in the small business space. In fact, it is so often misunderstood that objections to it are often met with “I think BackBlaze knows what they are doing” comments, which demonstrates the extreme misunderstanding that exists with the approach. Of course BackBlaze knows what they are doing, but they are not doing what any SMB is doing.
The release of the POD design to the public causes much confusion because it is only one part of a greater whole. The design of the full data center and the means of redundancy and mechanisms for such between the PODs is not public, but is proprietary. So the POD itself represents only a single node of a cluster (or Vault) and does not reflect the clustering itself, which is where the most important work takes place. In fact the POD design itself is nothing more than the work done by the Sun Thumper and SAM-SD projects of the past decade without the constraints of reliability. The POD should not be a novel design, but an obvious one. One that has for decades been avoided in the SMB storage space because it is so dramatically non-applicable.
Because the clustering and replication aspects are ignored when talking about the BackBlaze POD some huge assumptions tend to be made about the capacity of a POD that has much lower overhead than BackBlaze themselves get for the POD infrastructure, even at scale. For example, in RAID terms, this would be similar to assuming that the POD is RAID 6 (with only 5% overhead) because that is the RAID of an individual component when, in fact, RAID 61 ( 55% overhead) is used! In fact, many SMB IT Professionals when looking to use a POD design actually consider simply using RAID 6 in addition to only using a single POD. The degree to which this does not follow BackBlaze’s model is staggering.
BackBlaze: “Backblaze Vaults combine twenty physical Storage Pods into one virtual chassis. The Vault software implements our own Reed-Solomon encoding to spread data shards across all twenty pods in the Vault simultaneously, dramatically improving durability.”
To make the POD a consideration for the SMB market it is required that the entire concept of the POD be taken completely out of context. Both its intended use case and its implementation. What makes BackBlaze special is totally removed and only the most trivial, cursory aspects are taken and turned into something that in no way resembles the BackBlaze vision or purpose.
Digging into where the BackBlaze POD is differing in design from the standard needs of a normal business we find these problems:
- The POD is designed to be unreliable, to rely upon a reliability and replication layer at the super-POD level that requires a large number of PODs to be deployed and for data to be redundant between them by means of custom replication or clustering. Without this layer, the POD is completely out of context. The super-POD level is known internally as the BackBlaze Vault.
- The POD is designed to be deployed in an enterprise datacenter with careful vibration dampening, power conditioning and environmental systems. It is less resilient to these issues as standard enterprise hardware.
- The POD is designed to typically be replaced as a complete unit rather than repairing a node in situ. This is the opposite of standard enterprise hardware with hot swap components designed to be repaired without interruption, let alone without full replacement. We call this a disposable or ephemeral use case.
- The POD is designed to be incredibly low cost for very slow, cold storage needs. While this can exist in an SMB, typically it does not.
- The POD is designed to be a single, high capacity storage node in a pool of insanely large capacity. Few SMBs can leverage even the storage potential of a single POD let alone a pool large enough to justify the POD design.
- The BackBlaze POD is designed to use custom erasure encoding, not RAID. RAID is not effective at this scale even at the single POD level.
- An individual POD is designed for 180TB of capacity and a Vault scale of 3.6PB.
Current reference of the BackBlaze POD 5: https://www.backblaze.com/blog/cloud-storage-hardware/
In short, the BackBlaze POD is a brilliant underpinning to a brilliant service that meets a very specific need that is as far removed from the needs of the SMB storage market as one can reasonably be. Respect BackBlaze for their great work, but do not try to emulate it.