Tag Archives: featured

You Can’t Virtualize That!

We get this all of the time in IT, a vendor tells us that a system cannot be virtualized.  The reasons are numerous.  On the IT side, we are always shocked that a vendor would make such an outrageous claim; and often we are just as shocked that a customer (or manager) believes them.  Vendors have worked hard to perfect this sales pitch over the years and I think that it is important to dissect it.

The root cause of problems is that vendors are almost always seeking ways to lower costs to themselves while increasing profits from customers.  This drives a lot of what would otherwise be seen as odd behaviour.

One thing that many, many vendors attempt to do is limit the scenarios under which their product will be supported.  By doing this, they set themselves up to be prepared to simply not provide support – support is expensive and unreliable.  This is a common strategy.  It some cases, this is so aggressive that any acceptable, production deployment scenario fails to even exist.

A very common means of doing this is to fail to support any supported operating system, de facto deprecating the vendor’s own software (for example, today this would mean only supporting Windows XP and earlier.)  Another example is only supporting products that are not licensed for the use case (an example would be requiring the use of a product like Windows 10 be used as a server.)  And one of the most common cases is forbidding virtualization.

These scenarios put customers into difficult positions because on one hand they have industry best practices, standard deployment guidelines, in house tooling and policies to adhere to; and on the other hand they have vendors often forbidding proper system design, planning and management.  These needs are at odds with one another.

Of course, no one expects every vendor to support every potential scenario.   Limits must be applied.  But there is a giant chasm between supporting reasonable, well deployed systems and actively requiring unacceptably bad deployments.  We hope that our vendors will behave as business partners and share a common interest in our success or, at the very least, the success of their product and not directly seek to undermine both of these causes.  We would hope that, at a very minimum, best effort support would be provided for any reasonable deployment scenario and that guaranteed support would be likely offered for properly engineered, best practice scenarios.

Imagine a world where driving the speed limit and wearing a seatbelt would violate your car warranty and that you would only get support if you drove recklessly and unprotected!

Some important things need to be understood about virtualization.  The first is that virtualization is a long standing industry best practice and is expected to be used in any production deployment scenario for services.  Virtualization is in no way new, even in the small business market it has been in the best practice category for well over a decade now and for many decades in the enterprise space.  We are long past the point where running systems non-virtualized is considered acceptable, and that includes legacy deployments that have been in place for a long time.

There are, of course, always rare exceptions to nearly any rule.  Some systems need access to very special case hardware and virtualization may not be possible, although with modern hardware passthrough this is almost unheard of today.  And some super low latency systems cannot be virtualized but these are normally limited to only the biggest international investment banks and most aggressive hedgefunds and even the majority of those traditional use cases have been eliminated by improvements in virtualization making even those situations rare.  But the bottom line is, if you can’t virtualize you should be sad that you cannot, and you will know clearly why it is impossible in your situation.  In all other cases, your server needs to be virtual.

Is it not important?

If a vendor does not allow you to follow standard best practices for healthy deployments, what does this say about the vendor’s opinion of their own product?  If we were talking about any other deployment, we would immediately question why we were deploying a system so poorly if we plan to depend on it.  If our vendor forces us to behave this way, we should react in the same manner – if the vendor doesn’t take the product to the same degree that we take the least of our IT services, why should we?

This is an “impedance mismatch”, as we say in engineering circles, between our needs (production systems) and how the vendor making that system appears to treat them (hobby or entertainment systems.)  If we need to depend on this product for our businesses, we need a vendor that is on board and understands business needs – has a production mind set.  If the product is not business targeted or business ready, we need to be aware of that.  We need to question why we feel we should be using a service in production, on which we depend and require support, that is not intended to be used in that manner.

Is it supported?  Is it being tested?

Something that is often overlooked from the perspective of customers is whether or not the necessary support resources for a product are in place.  It’s not uncommon for the team that supports a product to become lean, or even disappear, but the company to keep selling the product in the hopes of milking it for as much as they can and bank on either muddling through a problem or just returning customer funds should the vendor be caught in a situation where they are simply unable to support it.

Most software contracts state that the maximum damage that can be extracted from the vendor is the cost of the product, or the amount spent to purchase it.  In a case such as this, the vendor has no risk from offering a product that they cannot support – even if charging a premium for support.  If the customer manages to use the product, great they get paid. If the customer cannot and the vendor cannot support it, they only lose money that they would never have gotten otherwise.  The customer takes on all the risk, not the vendor.

This suggests, of course, that there is little or no continuing testing of the product as well, and this should be of additional concern.  Just because the product runs does not mean that it will continue to run.  Getting up and running with an unsupported, or worse unsupportable, product means that you are depending more and more over time on a product with a likely decreasing level of potential support, slowly getting worse over time even as the need for support and the dependency on the software would be expected to increase.

If a proprietary product is deployed in production, and the decision is made to forgo best practice deployments in order to accommodate support demands, how can this fit in a decision matrix? Should this imply that proper support does not exist? Again, as before, this implies a mismatch in our needs.

 

Is It Still Being Developed?

If the deployment needs of the software follow old, out of date practices, or require out of date (or not reasonably current software or design) then we have to question the likelihood that the product is currently being developed.  In some cases we can determine this by watching the software release cycle for some time, but not in all cases.  There is a reasonable fear that the product may be dead, with no remaining development team working on it.  The code may simply be old, technical debt that is being sold in the hopes of making a last, few dollars off of an old code base that has been abandoned.  This process is actually far more common than is often believed.

Smaller software shops often manage to develop an initial software package, get it on the market and available for sale, but fail to be able to afford to retain or restaff their development team after initial release(s).  This is, in fact, a very common scenario.  This leaves customers with a product that is expected to become less and less viable over time with deployment scenarios becoming increasingly risky and data increasing hard to extricate.

 

How Can It Be Supported If the Platform Is Not Supported?

A common paradox of some more extreme situations is software that, in order to qualify as “supported”, requires other software that is either out of support or was never supported for the intended use case.  Common examples of this are requiring that a server system be run on top of a desktop operating system or requiring versions of operating systems, databases or other components, that are no longer supported at all.  This last scenario is scarily common.  In a situation like this, one has to ask if there can ever be a deployment, then, where the software can be considered to be “supported”?  If part of the stack is always out of support, then the whole stack is unsupported.  There would always be a reason that support could be denied no matter what.   The very reason that we would therefore demand that we avoid best practices would equally rule out choosing the software itself in the first place.

Are Industry Skills and Knowledge Lacking?

Perhaps the issue that we face with software support problems of this nature are that the team(s) creating the software simply do not know how good software is made and/or how good systems are deployed.  This is among the most reasonable and valid reasons for what would drive us to this situation.  But, like the other hypothesis reasons, it leaves us concerned about the quality of the software and the possibility that support is truly available.  If we can’t trust the vendor to properly handle the most visible parts of the system, why would we turn to them as our experts for the parts that we cannot verify?

The Big Problem

The big, overarching problem with software that has questionable deployment and maintenance practice demands in exchange for unlocking otherwise withheld support is not, as we typically assume a question of overall software quality, but one of viable support and development practices.  That these issues suggest a significant concern for long term support should make us strongly question why we are choosing these packages in the first place while expecting strong support from them when, from the onset, we have very visible and very serious concerns.

There are, of course, cases where no other software products exist to fill a need or none of any more reasonable viability.  This situation should be extremely rare and if such a situation exists should be seen as a major market opportunity for a vendor looking to enter that particular space.

From a business perspective, it is imperative that the technical infrastructure best practices not be completely ignored in exchange for blind or nearly blind following of vendor requirements that, in any other instance, would be considered reckless or unprofessional. Why do we so often neglect to require excellence from core products on which our businesses depend in this way?  It puts our businesses at risk, not just from the action itself, but vastly moreso from the risks that are implied by the existence of such a requirement.

Finding A Job, or Finding THE Job

Nearly everyone overlooks this incredibly basic question and yet nearly everyone has to face this it when thinking about their career decision making and their future. This applies to middle school students, those preparing for university, university grads and even mid-career professionals making key decisions about life goals.  Is our goal in our career and career preparation to land a job, meaning any job more or less (at least within our field); or is our goal to get to push our careers higher and higher looking for “the” job, the one that pays great, satisfies us, challenges us and fulfills us?  Everyone has to answer this question and nearly everyone does, even if they fail to admit it to themselves or anyone else.

Our answer to this question plays a part in effectively every decision that we make around our careers, and by extension in our lives. It affects what careers we choose to pursue, how we pursue them, what education we get, when we get it, which job offers we accept, to which jobs we submit our resume, when we start hunting for the next promotion or change, lateral shift or external opportunity, when we relocate, when we buy a home, if we take a consulting position or standard employment, what certificates we get, what books we read, what communities we participate in, when or if we decide to get married, when or if we decide to have children and how we interact with our colleagues among many, many other things.  And yet, with all of these things not just being influenced by this decision, but often being almost solely governed by it, few people really sit down and take the time to evaluate their personal career goals to determine how the decisions that they make and planning that they do will determine what kind of jobs they are likely to be able to pursue.  One of the most critical and defining choices of our lives is often given little thought and is treated as being practically a casual, trivial background decision.

People rarely want to talk about questions like this because the harsh reality is that most people, in fact nearly all people, cannot realistically achieve “the” job.  Their dream job or a top career position is likely out of their reach – at least while trying to maintain any kind of work/life balance, have a family, rear children or whatever.  No one wants to admit that they are the “majority” and are really just looking for “a” job and even fewer want to look at them and point out that this is the case for them.  But it is something that we should do (for ourselves, not pointing at others.)  We have to determine what matters for us, where our own priorities lie.

To our ears, going after any old job sounds horrible while seeking the pinnacle of the field sounds like a perfect goal, a natural one.  This is, to some non-trivial degree, an extension of that problem that we have all been talking about for a generation – the need for the glorification of the trivial, rewarding everyone as if average life events are something special (like having graduation parties for people moving from second to third grade, or awards for attendance because “just showing up” is worth an award?)

Life is not that simple, though, for several reasons.  First is statistics.  Realistically amazing jobs only make up something like .1% of all available jobs in the world.  That means that 99.9% of all workers have to go after less than apex jobs.  Even if we broaden the scope to say that “great” jobs represent just 2% of available jobs and 98% of people have to go after more mundane jobs, we still have the same situation: the chances that you are in the .1 – 2% is quite low.  Almost certainly, statistically speaking, you are in the 98%.  The numbers are not as terribly bad as they may seem because awesome jobs are not necessarily apex jobs, that is just one possibility.  The perfect job for you might be based on location, flexibility, benefit to humanity, ability to do rewarding work or compensation.  There are many possible factors, the idea of “the” job is not that it is purely about title or salary, but those are reasonable aspects to consider.

The second part is the other prices that need to be paid.  Attempting to go after “the” job generally relies on a lot of things such as being a good self starter, thinking outside of the box (career-wise), relocating, working longer hours, studying more, challenging others, self promotion, putting in long hours away from the office to improve faster than others, starting your career sooner, being more aggressive, etc.  None of these factors are strictly required, but commonly these and many others will play an important role.  Going after the dream job or apex role means taking more risks, pushing harder and setting ourselves apart.  It requires, on average, far more work and has a much less defined path from start to finish making it scarier, ambiguous and more risky.  High school guidance counselors cannot tell you how to get from point A to point B when talking about “the” job; they lack the knowledge, exposure and resources to help you with that.  When going after “the” job you are almost certainly forging your own path.  Everyone is unique and everyone’s perfect job is unique and often no one knows what that perfect job is exactly until they arrive at it, often after many years of hard work and seeking it.

These two mindsets change everything that we do.  One: we design our careers around optimum performance while accepting high chance of failure.  And two: we design our careers around risk mitigation and we hedge our bets sacrificing the potential for big payoffs (salary, position, benefits, whatever) in exchange for a more well defined job and career path with better stability and less chance of finding ourselves floundering or worse, out of work completely and maybe even unemployable.

If you spend a lot of time talking to people about their career goals you will often see these two mindsets at work, under the surface, but essentially no one will verbalize them directly.  But if you listen you can hear them being mulled about from time to time.  People will talk about priorities such as being able to live in the same house, town or region and their willingness to give up career options in exchange for this.  This is an important life decision, and a common one, where most people will choose to control where they live over where and how they work.  Another place you hear in the undertone of conversation is when people are contemplating their next career move – do they focus on the potential for opportunity or do they focus on the risks caused by instability and the unknown?

A major area in which these kinds of thoughts are often expressed, in one way or another, is around education and certification.  In IT especially we see people often approach their educational choices from a position of risk mitigation, rather than seized opportunity.  Very few people look to their education as “the path to this one, specific dream position” but instead generally speak about their education’s “ability to get them more interviews and job offers at more companies.”  It’s about a volume of offers, which is all about risk mitigation, rather than about getting the one offer that really matters to them.  Each person only needs one job, or at least one job at a time, so increasing the volume of potential jobs is not, realistically, a chance for greater achievement but rather simply a means of decreasing the risk around job loss and unemployment.

This is especially true when people discuss the necessity of certain educational factors for certain types of low paying, more entry level jobs – even people focusing on getting “a” job may often be shocked how often people target rather significant education levels for the express purpose of getting very low paying, low mobility, low reward jobs, but ones that are perceived as being more stable (often those in the public sector.)  This is mirrored in many certification processes.  Certifications are an extension of education in this way and many people go after common certifications, often in many different areas of study, in order to hedge against job loss in the future or to prepare for a change of direction at their current job or similar.  Education and certification are not generally seen as tools for success, but attempts to hedge against failure.

You may recognize this behavior expressed when people talk about creating a resume or CV designed to “get past HR filters.”  This makes total sense as a huge percentage (whether this is 5% or 80% does not matter) of jobs in the market place are gate-kept by non-technical human resources staff who may eliminate people based on their own prejudices or misunderstandings before qualified technical resources ever get a chance to evaluate the candidates.  So by targeting factors that help us to successfully pass the HR filter we get many more opportunities for a technical hiring manager to review our candidacy.

Of course, nearly everyone recognizes that an HR filtering process like this is horrific and will eliminate incredibly competent people, possibly the best people, right off of the bat.  There is no question that this is not even remotely useful for hiring the best potential employees.  And yet most everyone still attempts to get past these HR departments in the hopes of being hired by firms that have no interest, even at the most basic level, of hiring great people, but rather are looking mostly to eliminate the worst people.  Why do we do this so reliably?  Because the goal here is not to get the best possible job, but rather to have as many opportunities as possible to get, more or less, “a” job.

If we were seeking the best possible jobs we would actually be challenged in the opposite direction.  Rather than hoping to get past the HR filters, we might be more interested in being intentionally caught and removed by them.  When looking for the “perfect” career opportunity we care more about eliminated the “noise” of the interviewing process than we are in increasing the “hits”.  It is a completely different thought process.  In the “any job” case, we want as many opportunities as we can get so that we have one to take.  But in the “the job” case, we want less rewarding jobs (however this is defined for the individual) to filter themselves out of the picture as we would otherwise have them potentially wasting our time or worse, have them appear like a great opportunity that we might accidentally accept when we would not have done so had we known more about them up front.

When going after “a” job we expect people to accept jobs quickly and give them up reluctantly.  Those in the opposite position generally do exactly the opposite, giving a lot of thought and time to choosing the next career move but having little concern as to remaining at their last “stepping stone” position.

Somewhat counter-intuitively we may find that those wiling to take job offers more quickly may actually find themselves with fewer useful career opportunities in the long run.  The appearance of stability is not always what it seems and the market pressures are not always highly visible.  There are a couple of factors at play here.  One is that the path to the most common jobs is one that is well trodden and the competition for those jobs can be fierce.  So even though perhaps 90% of all jobs would be seen as falling into this category, perhaps 95% of all people are attempting to get those jobs.  The approach taken to get “a” job generally results in a lack of market differentiation for the potential worker (and for the job as well) making it difficult to stand out in a field so full of competition.

On the other hand, those that have worked hard to pursue their goals and have taken unique paths may be presented with technically fewer options, but those that they are presented with are usually far better and have a drastically smaller pool of competition vying for those positions.  This can mean that actually getting “the” job might be more likely than it would otherwise seem even to the point of being potentially easier than getting “a” job, at least through traditional means and approaches.  By taking the path less traveled, for example, the candidate working extremely hard to reach a dream position may find ways to bypass otherwise stringent job requirements, for example, or may simply leverage favorable statistical situations.

Also working in the favor of those seeking “the” job is that they tend to advance in their careers and develop powerful repertoires much more quickly.  This alone can be a major factor in mitigating the risk of going this route.  Powerful resumes, broad experience and deep skill sets will often allow them to command higher salaries and get into jobs in a variety of categories across more fields.  This flexibility from a capability and experience perspective can heavily offset the inherent risks that this path can appear to present.

At the end of the day, we have to evaluate our own needs on a personal level and determine what makes sense for us or for our families.  And this is something that everyone, even middle school students, should begin to think about and prepare for.  It requires much self reflection and a strong evaluation of our goals and priorities to determine what makes sense for us.  Because factors like high school classes and high school age interning and projects, university decisions, and more happen so early in life and are so heavily dependent on this realization of intention we can all benefit greatly by promoting this self evaluation as early on as possible.

And this information, this self evaluation, should be seen as a critical factor in any and all job and career discussions.  Understanding what matters to us individually will make our own decisions and the advice from others so much more meaningful and useful.  We so often depend on assumptions, often wrong, about whether we are looking for the chance to climb the ladder to a dream job or if we looking for a lifetime of safety and security and few, if any, are willing to outright state what factors are driving their assumptions and how those assumptions drive decisions.

How about you?  Are you looking at every career decision as “how does this get me to the best, most amazing position possible” or are you thinking “how will this put me at risk in the future?”  What are your priorities.  Are you looking for a job; or are you looking for the job.

The Emperor’s New Storage

We all know the story of the Emperor’s New Clothes.  In Hans Christian Anderson’s telling of the classic tale we have some unscrupulous cloth vendors who convince the emperor that they have clothes made from a fabric with the magical property of only being visible to people who are fit for their positions.  The emperor, not being able to see the clothes, decides to buy them because he fears people finding out that he cannot see them.  Everyone in the kingdom pretends to see them as well – all sharing the same fear.  It is a brilliant sales tactic because it puts everyone on the same team: the cloth sellers, the emperor, the people in the street all share a common goal that requires them to all maintain the same lie.  Only when a little boy who cares naught about his status in society but only about the truth points out that the emperor is naked is everyone free to admit that they don’t see the clothes either.

And this brings us to the storage market today.  Today we have storage vendors desperate to sell solutions of dubious value and buyers who often lack the confidence in their own storage knowledge to dare to question the vendors in front of management or who simply have turned to vendors to make their IT decisions on their behalf.  This has created a scenario where the vendor confidence and industry uncertainty has engendered market momentum causing the entire situation to snowball.  The effect is that using big, monolithic and expensive storage systems is so accepted today that often systems are purchased without any thought at all.  They are essentially a foregone conclusion!

It is time for someone to point at the storage buying process and declare that the emperor is, in fact, naked.

Don’t get me wrong.  I certainly do not mean to imply that modern storage solutions do not have value.  Most certainly they do.  Large SAN and NAS shared storage systems have driven much technological development and have excellent use cases.  They were not designed without value, but they do not apply to every scenario.

The idea of the inverted pyramid design, the overuse of SANs where they do not apply, came about because they are high profit margin approaches.  Manufacturers have a huge incentive to push these products and designs because they do much to generate profits.  SANs are one of the most profit-bearing products on the market.  This, in turn, incentivizes resellers to push SANs as well, both to generate profits directly through their sales but also to keep their vendors happy.  This creates a large amount of market pressure by which everyone on the “sales” side of the buyer / seller equation has massive pressure to convince you, the buyer, that a SAN is absolutely necessary.  This is so strong of a pressure, the incentives so large, that even losing the majority of potential customers in the process is worth it because the margins on the one customer that goes with the approach is generally worth losing many others.

Resellers are not the only “in between” players with incentive to see large, complex storage architectures get deployed.  Even non-reseller consultants have an incentive to promote this approach because it is big, complex and requires, on average, far more consulting and support than do simpler system designs.  This is unlikely to be a trivial number.  Instead of a ten hour engagement, they may win a hundred hours, for example, and for consultants those hours are bread and butter.

Of course, the media has incentive to promote this, too.  The vendors provide the financial support for most media in the industry and much of the content.  Media outlets want to promote the design because it promotes their sponsors and they also want to talk about the things that people are interested in and simple designs do not generate a lot of readership.  The same problems that exist with sensationalist news: the most important or relevant news is often skipped so that news that will gather viewership is shown instead.

This combination of factors is very forceful.  Companies that look to consultants, resellers and VARs, and vendors for guidance will get a unanimous push for expensive, complex and high margin storage systems.  Everyone, even the consultants who are supposed to be representing the client have a pretty big incentive to let these complex designs get approved because there is just so much money potentially sitting on the table.  You might get paid one hour of consulting time to recommend against overspending, but might be paid hundreds of hours for implementing and supporting the final system.  That’s likely tens of thousands of dollars difference, a lot of incentive, even for the smallest deployments.

This unification of the sales channel and even the front line of “protection” has an extreme effect.  Our only real hope, the only significant one, for someone who is not incentivized to participate in this system is the internal IT staff themselves.  And yet we find very rarely that internal staff will stand up to the vendors on these recommendations or even produce them themselves.

There are many reasons why well intentioned internal IT staff (and even external ones) may fail to properly assess needs such as these.  There are a great many factors involved and I will highlight some of them.

  • Little information in the market.  Because no company makes money by selling you less, there is almost no market literature, discussions or material to assist in evaluating decisions.  Without direct access to another business that has made the same decision or to any consultants or vendors promoting an alternative approach, IT professionals are often left all alone.  This lack of supporting experience is enough to cause adequate doubt to squash dissenting voices.
  • Management often prefers flashy advertising and the word of sales people over the opinions of internal staff.  This is a hard fact, but one that is often true.  IT professionals often face the fact that management may make buying decisions without any technical input whatsoever.
  • Any bid process immediately short circuits good design.  A bid would have to include “storage” and SAN vendors can easily bid on supplying storage while there is no meaningful way for “nothing” to bid on it.  Because there is no vendor for good design, good design has no voice in a bidding or quote based approach.
  • Lack of knowledge.  Often dealing with system architecture and storage concerns are one off activities only handled a few times over an entire career.  Making these decisions is not just uncommon, it is often the very first time that it has ever been done.  Even if the knowledge is there, the confidence to buck the trend easily is not.
  • Inexperience in assessing risk and cost profiles.  While these things may seem like bread and butter to IT management, often the person tasked with dealing with system design in these cases will have no training and no experience in determining comparative cost and risk in complex systems such as these.  It is common that risk goes unidentified.
  • Internal staff often see this big and costly purchase as a badge of honour or a means to bragging rights.  Excited to show off how much they were able to spend and how big their new systems are.  Everyone loves gadgets and these are often the biggest, most expensive toys that we ever touch in our industry.
  • Internal staff often have no access to work with equipment of this type, especially SANs.  Getting a large storage solution in house may allow them to improve their resume and even leverage the experience into a raise or, more likely, a new job.
  • Turning to other IT professionals who have tackled similar situations often results in the same advice as from sales people.  This is for several reasons.  All of the reasons above, of course, would have applied to them plus one very strong one – self preservation.  Any IT professional that has implemented a very costly system unnecessarily will have a lot of incentive to state that they believe that the purchase was a good one.  Whether this is irrational “reverse rationalization” – the trait where humans tend to apply ration to a decision that lacked ration when originally made, because they fear that their job may be in jeopardy if it was found out what they had done or because they have not assessed the value of the system after implementation; or even possibly because their factors were not the same as yours and the design was applicable to their needs.

The bottom line is that basically everyone, no matter what role they play, from vendors to sales people to those that do implementation and support to even your friends in similar job roles to strangers on Internet forums, all have big incentives to promote costly and risky storage architectures in the small and medium business space.  There is, for all intents and purposes, no one with a clear benefit for providing a counter point to this marketing and sales momentum.  And, of course, as momentum has grown the situation becomes more and more entrenched with people even citing the questioning of the status quo and asking critical questions as irrational or reckless.

As with any decision in IT, however, we have to ask “does this provide the appropriate value to meet the needs of the organization?”  Storage and system architectural design is one of the most critical and expensive decisions that we will make in a typical IT shop.  Of all of the things that we do, treating this decision as a knee-jerk, foregone conclusion without doing due diligence and not looking to address our company’s specific goals could be one of the most damaging that we make.

Bad decisions in this area are not readily apparent.  The same factors that lead to the initial bad decisions will also hide the fact that a bad decision was made much of the time.  If the issue is that the solution carries too much risk, there is no means to determine that better after implementation than before – thus is the nature of risk.  If the system never fails we don’t know if that is normal or if we got lucky.  If it fails we don’t know if this is common or if we were one in a million.  So observation of risk from within a single implementation, or even hundreds of implementations, gives us no statistically meaningful insight.  Likewise when evaluating wasteful expenditures we would have caught a financial waste before the purchase just as easily as after it.  So we are left without any ability for a business to do a post mortem on their decision, nor is there an incentive as no one involved in the process would want to risk exposing a bad decision making process.  Even companies that want to know if they have done well will almost never have a good way of determining this.

What makes this determination even harder is that the same architectures that are foolish and reckless for one company may be completely sensible for another.  The use of a SAN based storage system and a large number of attached hosts is a common and sensible approach to controlling costs of storage in extremely large environments.  Nearly every enterprise will utilize this design and it normally makes sense, but is used for very different reasons and goals than apply to nearly any small or medium business.  It is also, generally, implemented somewhat differently.  It is not that SANs or similar storage are bad.  What is bad is allowing market pressure, sales people and those with strong incentives to “sell” a costly solution to drive technical decision making instead of evaluating business needs, risk and cost analysis and implementing the right solution for the organization’s specific goals.

It is time that we, as an industry, recognize that the emperor is not wearing any clothes.  We need to be the innocent children who point, laugh and question why no one else has been saying anything when it is so obvious that he is naked.  The storage and architectural solutions so broadly accepted benefit far too many people and the only ones who are truly hurt by them (business owners and investors) are not in a position to understand if they do or do not meet their needs.  We need to break past the comfort provided by socially accepted plausible deniability or understanding, or culpability for not evaluating.  We must take responsibility for protecting our organizations and provide solutions that address their needs rather than the needs of the sales people.

 

For more information see: When to Consider a SAN and The Inverted Pyramid of Doom

Making the Best of Your Inverted Pyramid of Doom

The 3-2-1 or Inverted Pyramid of Doom architecture has become an IT industry pariah for many reasons. Sadly for many companies, they only learn about the dangers associated with this design after the components have arrived and the money has left the accounts.

Some companies are lucky and catch this mistake early enough to be able to return their purchases and start over with a proper design and decision phase prior to the acquisition of new hardware and software. This, however, is an ideal and very rare situation. At best we can normally expect restocking fees and, far more commonly, the equipment cannot be returned at all or the fees are so large as to make it pointless.

What most companies face is a need to “make the best” of the situation moving forward. One of the biggest concerns is that concerned parties, whether it be the financial stake holders who have just spent a lot of money on the new hardware or if it is the technical stakeholders who now look bad for having allowed this equipment to be purchased, to succumb to an emotional reaction resulting in giving in to the sunk cost fallacy. It is vital that this emotional, illogical reaction not be allowed to take hold as it will undermine critical decision making.

It must be understood that the money spent on the inverted pyramid of doom has already been spent and is gone. That the money was wasted or how much was wasted is irrelevant to decision making at this point. If the system was a gift or if it cost a billion dollars does not matter, that money is gone and now we have to make do with what we have. A potential “trick” here would be to bring in a financial decision maker like a CFO, explain that there is about to be an emotional reaction to money already spent and discuss the sunk cost fallacy before talking about the actual problem so that people are aware and logical and the person trained (we hope) to best handle this kind of situation is there and ready to head off sunk cost emotions. Careful handling of a potentially emotionally-fueled reaction is important. This is not the time to attempt to cover up either the financial or the technical missteps, which is what the emotional reaction is creating. It is necessary for all parties to communicate and remain detached and logical in order to address the needs. Some companies handle this well, many do not and become caught trying to forge forward with bad decisions that were already made, probably in the hopes that nothing bad happens and that no one remembers or notices. Fight that reaction. Everyone has it, it is the natural amygdala “fight or flight” emotional response.

Now that we are ready to fight the emotional reactions to the problem we can begin to address “where do we go from here.” The good news is that where we are is generally a position of having “too much” rather than “too little.” So we have an opportunity to be a little creative. Thankfully there are generally good options that can allow us to move in several directions.

One thing that is very important to note is that we are looking at solutions exclusively that are more reliable, not less reliable, than the intended inverted pyramid of doom architecture that we are replacing. An IPOD is a very fragile and dangerous design and we could go to great lengths demonstrating concepts like risk analysis, single points of failure, the fallacies of false redundancy, looking at redundancy instead of reliability, dependency chains, etc. but what is absolutely critical for all parties to understand is that a single server, running with local storage is more reliable than the entire IPOD infrastructure would be. This is so important that it has to be said again: if a single server is “standard availability”, the IPOD is lower than that. More risky. If anyone at this stage fears a “lack of redundancy” or a “lack of complexity” in the resulting solutions we have to come back to this – nothing that we will discuss is as risky as what had already been designed and purchased. If there is any fear of risk going forward, the fear should have been greater before we improved the reliability of the design. This cannot be overstated. IPODs sell because they easily confuse those not trained in risk analysis and look reliable when, in fact, they are anything but.

Understanding the above and using a technique called “reading back” the accepted IPOD architecture tells us that the company in question was accepting of not having high availability (or even standard availability) at the time of purchasing the IPOD. Perhaps they believed that they were getting that, but the architecture could not provide it and so moving forward we have the option of “making do” with nothing more than a single server, running on its own local storage. This is simple and easy and improves on nearly every aspect of the intended IPOD design. It costs less to run and maintain, is often faster and is much less complex while being slightly more reliable.

But likely simply dropping down to a single server and hoping to find uses for the rest of the purchased equipment “elsewhere” is not going to be our best option. In situations where the IPOD had been meant to only be used for a single workload or set of workloads and other areas of the business have need for equipment as well it can be very beneficial to go to the “single server” approach for the intended IPOD workload and utilize the remaining equipment elsewhere in the business.

The most common approach to take with repurposing an IPOD stack is to reconfigure the two (or more) compute nodes to be full stack nodes containing their own storage. This step may require no purchases, depending on what storage has already been purchased, a movement of drives between systems or often the relatively small purchase of additional hard drives for this purpose.

These nodes can then be configured into one of two high availability models. In the past a common design choice, for cost reasons, was to use an asynchronous replication model (often known as the Veeam approach) that will replicate virtual machines between the nodes and allow VMs to be powered up very rapidly allowing for a downtime from the moment of compute node failure until recovery of as little as just a few minutes.

Today fully synchronous fault tolerance is available so commonly for free that it has effectively replaced the asynchronous model in nearly all cases. In this model storage is replicated in fully real time between the compute nodes allowing for failover to happen instantly, rather than with a few minutes delay, and with zero data loss instead of a small data loss window (e.g. RPO of zero.)

At this point it seems to be common for people to react to replication with a fear of a loss of storage capacity caused by the replication. Of course this is true. It is necessary that it be understood that it is this replication, missing from the original IPOD design, that provides the firm foundation for high reliability. If this replication is skipped, high availability is an unobtainable dream and individual compute nodes using local storage in a “stand alone” mode is the most reliable potential option. High availability solutions rely on replication and redundancy to build the necessary reliability to qualify for high availability.

This solves the question of what to do with our compute nodes but leaves us with what we can do with our external shared storage device, the single point of failure or the “point” of the inverted pyramid design. To answer this question we should start by looking at what this storage might be.

There are three common types of storage devices that would be used in an inverted pyramid design: DAS, SAN and NAS. We can lump DAS and SAN together as they are both two different aspects of block storage and can be used essentially interchangeably in our discussion – they are only differentiated by the existence of switching which can be added or removed as needed in our designs. NAS differs by being file storage rather than block storage.

In both cases, block (DAS or SAN) or file (NAS) storage one of the most common usages for this now superfluous device is as a backup target for our new virtualization infrastructure. In many cases the device may be overkill for this task, generally with more performance and many more features than needed for a simple backup target but good backup storage is important for any critical business infrastructure and erring on the side of overkill is not necessarily a bad thing. Businesses often attempt to skimp on their backup infrastructures and this is an opportunity to invest heavily in it without spending any extra money.

Along the same vein as backup storage, the external storage device could be repurposed as archival storage or other “lower tier” of storage where high availability is not warranted. This is a less common approach, generally because every business needs a good backup system but only some have a way to leverage an archival storage tier.

Beyond these two common and universal storage models, a common use case for external storage devices, especially if the device is a NAS, is to leverage it in its native rule as a file server separate from the virtualization infrastructure. For many businesses file serving is not as uptime critical as the core virtualization infrastructure and backups are far easier to maintain and manage. By offloading file serving to an already purchased NAS device this can reduce file serving requirements from the virtualization infrastructure both by reducing the number of VMs that need to be run there as well as moving what is typically one of the largest users of storage to a separate device which can lower the performance requirements of the virtualization infrastructure as well as its capacity requirements. By doing this we potentially reduce the cost of obtaining necessary additional hard drives for the local storage on the compute nodes as we stated earlier and so this can be a very popular method for many companies to address the repurposing needs.

Every company is unique and there are potentially many places where spare storage equipment could be effectively used from labs to archives to tiered storage. Using a little creativity and thinking outside of the box can be leveraged to take your unique set of available equipment and your business’ unique set of needs and demands and find the best place to use this equipment where it is decoupled from the core, critical virtualization infrastructure but can still bring value to the organization. By avoiding the inverted pyramid of doom we can obtain the maximum value from the equipment that we have already invested in rather than implementing fresh technical debt that we have to them work to overcome unnecessarily.