The Inverted Pyramid of Doom

The 3-2-1 model of system architecture is extremely common today and almost always exactly the opposite of what a business needs or even wants if they were to take the time to write down their business goals rather than approaching an architecture from a technology first perspective. Designing a solution requires starting with business requirements, otherwise we not only risk the architecture being inappropriately designed for the business but rather expect it.

The name refers to three (this is a soft point, it is often two or more) redundant virtualization host servers connected to two (or potentially more) redundant switches connected to a single storage device, normally a SAN (but DAS or NAS are valid here as well.) It’s an inverted pyramid because the part that matters, the virtualization hosts, depend completely on the network which, in turn, depends completely on the single SAN or alternative storage device. So everything rests on a single point of failure device and all of the protection and redundancy is built more and more on top of that fragile foundation. Unlike a proper pyramid with a wide, stable base and a point on top, this is built with all of the weakness at the bottom. (Often the ‘unicorn farts’ marketing model of “SANs are magic and can’t fail because of dual controllers” comes out here as people try to explain how this isn’t a single point of failure, but it is a single point of failure in every sense.)

So the solution, often called a 3-2-1 design, can also be called the “Inverted Pyramid of Doom” because it is an upside down pyramid that is too fragile to run and extremely expensive for what is delivered. So unlike many other fragile models, it is very costly, not very flexible and not as reliable as simply not doing anything beyond having a single quality server.

There are times that a 3-2-1 makes sense, but mostly these are extreme edge cases where a fragile environment is desired and high levels of shared storage with massive processing capabilities are needed – not things you would see in the SMB world and very rarely elsewhere.

The inverted pyramid looks great to people who are not aware of the entire architecture, such as managers and business people. There are a lot of boxes, a lot of wires, there are software components typically which are labeled “HA” which, to the outside observer, makes it sounds like the entire solution must be highly reliable. Inverted Pyramids are popular because they offer “HA” from a marketing perspective making everything sound wonderful and they keep the overall cost within reason so it seems almost like a miracle – High Availability promises without the traditional costs. The additional “redundancy” of some of the components is great for marketing. As reliability is difficult to measure, business people and technical people alike often resort to speaking of redundancy instead of reliability as it is easy to see redundancy. The inverted pyramid speaks well to these people as it provides redundancy without reliability. The redundancy is not where it matters most. It is absolutely critical to remember that redundancy is not a check box nor is redundancy a goal, it is a tool to use to obtain reliability improvements. Improper redundancy has no value. What good is a car with a redundant steering wheel in the trunk? What good is a redundant aircraft if you die when the first one crashes? What good is a redundant sever if your business is down and data lost when the single SAN went up in smoke?

The inverted pyramid is one of the most obvious and ubiquitous examples of “The Emperor’s New Clothes” used in technology sales. Because it meets the needs of the resellers and vendors by promoting high margin sales and minimizing low margin ones and because nearly every vendor promotes it because of its financial advantages to the seller it has become widely accepted as a great solution because it is just complicated and technical enough that widespread repudiation does not occur and the incredible market pressure from the vast array of vendors benefiting from the architecture it has become the status quo and few people stop and question if the entire architecture has any merit. That, combined with the fact that all systems today are highly reliable compared to systems of just a decade ago causing failures to be uncommon enough that the fact that they are more common that they should be and statistical failure rates are not shared between SMBs, means that the architecture thrives and has become the de facto solution set for most SMBs.

The bottom line is that the Inverted Pyramid approach makes no sense – it is far more unreliable than simpler solutions, even just a single server standing on its own, while costing many times more. If cost is a key driver, it should be ruled out completely. If reliability is a key driver, it should be ruled out completely. Only if cost and reliability take very far back seats to flexibility should it even be put on the table and even then it is rare that a lower cost, more reliable solution doesn’t match it in overall flexibility within the anticipated scope of flexibility. It is best avoided altogether.

Originally published on Spiceworks in abridged form: http://community.spiceworks.com/topic/312493-the-inverted-pyramid-of-doom

5 thoughts on “The Inverted Pyramid of Doom”

JR says:
August 7, 2013 at 1:17 pm
Great article! But one thing I feel is missing is recommendations to counteract this. If a company has the “inverted pyramid” in place, effectively limited by the SAN being a SPOF (even if it has dual controllers, and the storage array is RAID 5 or 6), then what is the solution? A redundant SAN?
Marko E says:
September 4, 2013 at 4:39 am
I am using a redundant SAN and have no Problems even if something burns down. The SANs are in separated locations.
JShoe says:
September 9, 2013 at 10:07 am
@JR,
I believe the author mentioned that servers with local storage would be more reliable.
Francois says:
March 17, 2014 at 9:46 am
I am currently running the 3-1 inverted pyramid (i removed the switch layer because it wasn’t needed). Yes my SAN (mostly redundant) could go down but at least I am redundant everywhere else. Running on a local server is not redundant at all and not fail safe.
The setup allows me to have any number of vm’s with single Microsoft licensing, this is less expansive then buying a server for every roles in my organisation and buying licenses for each of them.
Also, using vmotion, this setup allows me to do maintenance by removing my server from the cluster with no downtime. There is alot of advantages running this setup.
I would realy like to know what you propose as a good architecture for the SMB?
Pingback: An Open Source EqualLogic Replacement: Part 1 - Derek Zeanah's Blog