Category Archives: Business of IT

Hello, 1998 Calling….

Something magic seems to have happened in the Information Technology profession somewhere around 1998.  I know, from my own memory, that the late 90s were a special time to be working in IT.  Much of the architecture and technology that we have today stem from this era.  Microsoft moved from their old DOS products to Windows NT based, modern operating systems.  Linux became mature enough to begin appearing in business.  Hardware RAID became common, riding on the coattails of Intel’s IA32 processors as they finally begin to become powerful enough for many businesses to use seriously in servers.  The LAN became the business standard and all other models effectively faded away.  The Windows desktop became the one and only standard for regular computing and Windows servers were rapidly overtaking Novell as the principle player in LAN-based computing.

What I have come to realize over the last few years is that a large chunk of the communal wisdom of the industry appears to have been adopted during these formative and influential years of the IT profession and have since then passed into myth.  Much like the teachings of Aristotle who went for millennia considered to be the greatest thinker of all time and not to be questioned – stymieing scientific thought and providing a cornerstone for the dark ages.  A foundation of “rules of thumb” used in IT have passed from mentor to intern, from professor to student, from author to reader over the past fifteen or twenty years, many of them being learned by rote and treated as infallible truths of computing without any thought going into the reasoning and logic behind the initial decisions.  In many cases so much time has come and gone that the factors behind the original decisions are lost or misunderstood as those hoping to understand them today lack firsthand knowledge of computing from that era.

The codification of IT in the late nineties happened on an unprecedented scale driven primarily by Microsoft sudden lurching from lowly desktop maker to server and LAN ecosystem powerhouse.  When Microsoft made this leap with Windows NT 4 they reinvented the industry, a changing of the guard, with an entirely new generation of SMB IT Pros being born and coming into the industry right as this shift occurred.  This was the years leading up to the Y2K bubble with the IT industry swelling its ranks as rapidly as it could find moderately skilled computer-interested bodies.  This meant that everything had to be scripted (steps written on paper, that is) and best practices had to be codified to allow those with less technical backgrounds and training to work.  A perfect environment for Microsoft and their “never before seen” level of friendliness NT server product.  All at once the industry was full of newcomers without historical perspective, without the training and experience and with easy to use servers with graphical interfaces making them accessible to anyone.

Microsoft lept at the opportunity and created a tidal wave of documentation, best practices and procedures to allow anyone to get basic systems up and running quickly, easily and, more or less, reliably.  To do this they needed broad guidelines that were applicable in nearly all common scenarios, they needed it written in clear published form and they needed to guarantee that the knowledge was being assimilated.  Microsoft Press stepped in with the official publications of the Microsoft guidelines and right on its heels Microsoft MCSE program came into the spotlight totally changing the next decade of the profession.  There had been other industry certifications before the MCSE but the Windows NT 4 era and the MCP / MCSE certification systems were the game changing events of the era.  Soon everyone was getting boot camped through certification quickly memorizing Microsoft best practices and recommendations, learning them by rote and getting certified.

In the short term, the move did wonders for providing Microsoft an army of minimally skilled, but skilled nonetheless, supporters who had their own academic interests aligned with Microsoft’s corporate interest forming a symbiotic relationship that completely defined the era.  Microsoft was popular because nearly every IT professional was trained on it and nearly every IT professional encourage the adoption of Microsoft technologies because they had been trained and certified on it.

The rote guidelines of the era touched many aspects of computing, many are probably still unidentified to this day so strong was the pressure that Microsoft (and others) put on the industry at the time.  Most of today’s concepts of storage and disk arrays, filesystems, system security, networking, system architecture, application design, memory, swap space tuning and countless others all arose during this era and passed, rather quickly, into lore.  At the time we were aware that these were simply rules of thumb, subject to change just as they always had based on the changed in the industry.  Microsoft, and others, tried hard to make it clear what underlying principles created the rules of thumb.  It was not their intention to create a generation having learned by rote, but it happened.

That generation went on to be the effective founding fathers of modern LAN management.  In the small and medium business space the late 1990s represented the end of the central computer and remote terminals design, the Internet became ubiquitous (providing the underpinnings for the extensive propagation of the guidelines of the day), Microsoft washed away the memory of Novell and LANtastic, Ethernet over twisted pair completely abolished all competing technologies in LAN networking, TCP/IP beat out all layer three networking competitors and more.  Intel’s IA32 processor architecture began to steal the thunder from the big RISC processors of the previous era or the obscure sixteen and thirty two bit processors attempting to unseat Intel for generations.  The era was defining to a degree few who come since will ever understand.  Dial up networking gave way to always-on connections.  Disparate networks that could not communicate with each other lost to the Internet and a single, global networking standard.  Vampire taps and hermaphrodite connectors gave in as RJ45 connectors took to the field.  The LAN of 1992 looked nothing like the LAN of 1995.  But today, what we use, while faster and better polished, is effectively identical to the computing landscape as it was by around 1996.

All of this momentum, whether intentional or accidental, created an unstoppable force of myth driving the industry.  Careers were built on this industry wisdom taught around the campfire at night.  One generation clinging to their established beliefs, no longer knowing why they trusted those guidelines or if they applied, and another being taught them with little way to know that they were being taught distilled rules of thumb meant to be taught with coinciding background knowledge and understanding and having been designed not only for a very specific era, roughly the band from 1996 to 1999, but also, in a great many cases, for very specific implementations or products, generally Windows 95 and Windows NT 4 desktops and Windows NT 4 servers.

Today this knowledge is everywhere.  Ask enough questions and even young professionals still at university or doing a first internship are likely to have heard at least a few of the more common nuggets of conventional IT industry wisdom.  Sometimes the recommendations, applied to day, are nearly benign representing little more than inefficiency or performance waste.  In other cases they may represent pretty extreme degrees of bad practice today carrying significant risk.

It will be interesting to see just how long the late 1990s continue to so vastly influence our industry today.  Will the next generation of IT professionals finally issue a broad call to deep understanding and question the rote learning of the past eras?  Will misunderstood recommendations still be commonplace in the 2020s?  At the current pace of change, it seems unlikely that any significant change to the thinking of the industry is likely to change too much prior to 2030.  IT has been attempting to move from its wild west, everyone distilling raw knowledge into practical terms on their own to large scale codification like other, similar, fields like civil or electrical engineering, but the rate of change, while tremendously slowed since the rampant pace of the 70s and 80s, still remains so high that the knowledge of one generation is nearly useless to the next and only broad patterns, approaches and thought processes have great value to be taught mentor to student.  We may easily face another twenty years of the wild west before things begin to really settle down.

The Smallest IT Department

Working with small businesses means working with small IT shops.  It is very common to find the “one man” shows and I am often in discussions about how to handle environments so small.  There is no easy answer.  Unlike most company departments or job roles, IT is almost always an “around the clock” job that services the fundamental “plumbing” of the business – the infrastructure on which everything else depends.  Normal departments like finance, human resources, legal, management or marketing tend to knock off at the end of the day, leave an hour early on Fridays, go completely offline during the weekend, take normal vacations with little or no office contact, require little ongoing education or training once they are established and almost never have to worry about being expected to spend their nights or weekends doing their work to avoid interrupting others while they work, but this exactly how IT departments need to function.  IT staffs don’t reminisce about that “one time” that things were so bad at work that they had to work through the whole weekend or a full overnight and still work the next day or had to give up their family vacation because the company made no allowance for it operationally – that is simply day to day life for many people in IT.  What other departments often feel is completely unacceptable in IT is just normal practice.  But that doesn’t mean that it works well, IT departments are often driven into the ground and little consideration is given for their long term viability or success.

With rare exception, IT departments have needs that are different from normal departments – based primarily on what business demand from them: high reliability, continuous availability, deep business knowledge of all departments, ability to train others, knowledge of broad and disparate technologies, business skills, financial skills, procurement skills, travel, experience across technologies and industries, efficiency and experience on the latest technologies, trends, architectures, techniques and knowledge of the latest threats and products arriving daily – and to not only use all of that skill and experience to provide support roles but to also be a productive engineer, customer service representative and to present and defend recommendations to management that often pushes back or provides erratic or emotional support of infrastructural needs.  Quite literally, no single person can possibly fill those shoes and one that could would demand a salary higher than the revenue of most small businesses.

How do larger businesses handle this daunting task?  They do so with large IT departments filled with people who specialize in specific tasks, generalists who glue specialists together, dedicated support people who don’t need to do engineering, engineers who don’t get support interruptions, tiered support roles to filter tasks by difficulty, mentors to train newcomers, career pipelines, on call schedules or follow the sun support desks and internal education systems.  The number of challenges presented to a lone IT professional or very small IT department are nearly insurmountable forcing corners to be cut nearly everywhere, often dangerously.  There is no time or resources for tiny IT departments to handle the scope of the job thrown at them.  Even if the job role is whittled down to a very specific job role, SMB IT professionals are often faced with decision making for which they cannot be prepared.  For example, a simple server failure might be seen as just another “hardware upgrade” task because the overworked, under-scoped IT professional isn’t being given the necessary latitude to be able to flag management as to an arising opportunity for some strategic roadmap execution – maybe a complete departure from previous plans due to a late breaking technology change, or a chance to consolidate systems for cost savings or a tactical upgrade or change of platform might deliver unrealized features.

Having worked both in the trenches and in management I believe that there are two thresholds that need to be considered.  One is the minimum functional IT department size.  That is, the minimal size that an internal IT department can be to be able to complete basic job functions using internal staff.  To clarify, “internal staff” can be a rather ambiguous term.  Internal here I use to mean dedicated or effectively dedicated staff.  These people can be employees or contractors.  But at a minimum, with the exception of very rare companies that don’t operate during full business hours or other niche scenario, it takes at least three IT professionals on an IT team to functionally operate as an IT department.

With three people there is an opportunity for peer review, very critical in a technical field that is complex at the best of times and a swirling quagmire of unknown requirements, continuous change and insurmountable complexity at the worst of times.  Like any technical field, IT professionals need peers to talk to, to oversee their work, to check their ideas against and to keep them from entering the SMB IT Bubble.  Three is an important number.  Two people will have a natural tendency to become adversarial with one carrying the weight of recommendation to management and one living in their shadow – typically with the one with the greater soft skills or business skills gaining the ear of management while the one with the greater technical acumen losing their voice if management isn’t careful to intentionally include them.  As with maritime chronometers, it is critical that you have three because you can have a quorum.  Two simply have an argument.

IT is an “around the clock” endeavor.  During the day there are continuous needs from IT end users and the continuous potential for an outage or other disaster plus meetings, design sessions, planning and documentation.  In the evenings and on weekends there is all of the system maintenance that cannot, or at least should not, be done while the business is up and running.  This is often an extensive level of work, not an occasional bit of missing happy hour but regular workload eliminating dinner and family time.  Then comes the emergency calls and outages that happen any time day or night.  And there is the watching of email – even if nothing is wrong it is commonplace for IT to be involved in company business twelve to sixteen hours a day and weekends too, even in very small companies.  Even the most dedicated IT professional will face rapid burnout in an environment such as this without the ability to have a service rotation to facilitate necessary rest and work/life balance.

This comes before the considerations for the unforeseeable sick days, emergency leave or even just holidays or vacation.  If there are not enough people left behind to cover the business as usual tasks plus the unforeseeables, then vacations or even sick days become nearly, if not totally, impossible.  Skipping vacations for a year or two is theoretically possible but it is not healthy and doesn’t provide for a sustainable department.

Then there is training and education.  IT is a demanding field.  Running your own IT department suggests a desire to control the level of skill and availability granted to the company.  To maintain truly useful IT staff time and resources for continuous education is critical.  IT pros at any stage in their career need to have time to engage in discussions and forums, attend classes and training, participate in user groups, go to conferences and even just sit down and read books and web sites on the latest products, techniques and technologies.  If an IT professional is not given the chance to not just maintain, but grow their skills they will stagnate and gradually become useless technically and likely to fall into depression.  A one or two man shop, with even the smallest of organizations, cannot support the necessary free time for serious educational opportunities.

Lastly, and far more critical than it seems at first, is the need to handle request queues.  If issues arise within a business at a continuous, average rate of just enough per day to require eight hours per day to service them it may seem like only one person would be necessary to handle the queue that this work load would generate.  In an ideal world, perhaps that is true.  In the real world, requests come in at varying degrees of priority and often at very inopportune moments so that even a business that has taken on the expense of having dedicated, internal IT cannot have the “instant response time” that they often hope for because their IT professional is busy on an existing task.  The idea of instant response is based on the assumption that the IT resource is sitting idle and watching the ticket queue or waiting by the phone at all times.  That is not realistic.

In large enterprises, to handle the response time concerns of critical environments, surplus IT resources are maintained so that only in the direst of emergencies would all of them be called upon at one time to deal with high criticality issues at the same time.  There is always someone left behind to deal with another pressing issue should one arise.  This not only allows for low latency response to any important customer need but also provides spare time for projects, learning and the necessary mental downtime needed for abstract processing of troubleshooting without which IT professionals in a support role will lose efficiency even if other work does not force them to multitask.

In small shops there is little to be done.  There is a lack of scale to allow for the excess IT resource capacity to be sitting n the wings just waiting for issues to arise.  Having three people is, in my opinion, an absolute minimum to allow for the handling of most cases of this nature if the business is small enough.  By having three people there is, we hope, some chance of avoiding continuous re-prioritization of requests, inefficient multi-tasking and context switching.

In larger organizations there is also a separation of duties between administration or support job roles and engineering job roles.  One job is event driven, sitting “idle” waiting for a customer request and then reacting as quickly as possible. The other focused on projects and working towards overall efficiency.  Two very different aspects of IT that are nearly impossible for a single person to tackle simultaneously.  With a three person shop these roles can exist in many cases even if the roles are temporarily assigned as needed and not permanent aspects of title or function.

With only three people an IT department still lacks the size and scale necessary to provide a healthy, professional growth and training environment internally.  There are not enough rungs on the ladder for IT employees to move up and only turnover, unlikely to happen in the top slot, allows for any upward mobility forcing good candidates to leave rapidly for the sake of their careers leaving good shops with continuous turnover and training and lesser shops with dramatically inferior staff.  There is no simple solution for small organizations.  IT is a broad field with a great many steps on the ladder from helpdesk to CIO.  Top IT organizations have thousands or, in the most extreme cases, hundreds of thousands of IT professionals in a single organization.  These environments naturally have a great degree or both upward and lateral mobility, peer interaction and review, vendor resources, mentoring, lead oversight, career guidance and development and opportunities to explore new ideas and paths often that don’t exist in SMBs of any size.

To maintain a truly healthy IT department takes a much larger pool of resources.  Likely one hundred, or more, IT professionals would be required to provide adequate internal peerage, growth and opportunity to begin to provide for career needs, rather than “job needs.”  Realistically, the SMB market cannot bear this at an individual business scale and must accept that the nature of SMB IT is to have high turnover of the best resources and to work with other businesses, typically ones that are not directly competitive, to share or exchange resources.  In the enterprise space, even in the largest businesses, this is often very common – friendly exchanges of IT staff to allow for career advancement often with no penalties for returning later in their career for different positions at the original company.

Given this bleak picture of SMB IT staff scaling needs, what is the answer?  The reality is is that there is no easy one.  SMB IT sits at a serious disadvantage to its enterprise counterparts and at some scale, especially falling below three dedicated IT staff members, the scale becomes too low to allow for a sustainable work environment in all but the most extreme cases.

In smaller organizations, one answer is turning to consulting, outsourcing and/or managed service providers who are willing and able to work either in the role of internal staff or as a hybrid with existing internal staff to provide for an effectively larger IT organization shared between many businesses.   Another is simply investing more heavily in IT resources or using other departments as part time IT to handle helpdesk or other high demand roles, but this tends to be very ineffective as IT duties tend to overwhelm any other job role.  A more theoretical approach is to form a partnership with another one or two businesses to share in house IT in a closed environment.  This last approach is very difficult and problematic and generally works only when technology is heavily shared as is geographic location between the businesses in question.

More important than providing a simple answer is the realization that IT professionals need a team on which to work in order to thrive and will perform far better on a healthy team than they will alone.  How this is accomplished depends on the unique needs of any given business.  But the efficacy and viability of the one or two “man” IT shop, for even the smallest businesses, is questionable.  Some businesses are lucky enough to find themselves in a situation where this can work for a few years but often live day to day at a high degree of risk and almost always face high turnover with their entire IT department, a key underpinning of the workings of their entire business, leaving at once with the benefits of staggered turnover that a three person and larger shop at least have an opportunity to provide.  With a single person shop there is no handover of knowledge from predecessors, no training and often no opportunity to seek an adequate replacement before the original IT professional is gone leaving at best an abrupt handover and at worst a long period of time with no IT support at all and no in house skills necessary to interview and locate a successor.

Keeping IT in Context

Information Technology doesn’t exist in a bubble, it exists to serve a business or organization (for profit, non-profit, government, etc.)  The entity which we, as IT professionals, serve provides the context for IT.  Without this context IT changes, it becomes just “technology.”

One of the biggest mistakes that I see when dealing with companies of all sizes is the proclivity for IT professionals to forget the context in which they are working and start behaving in one of two ways.  First forgetting context completely and leaving IT for “hobbyist land” and looking at the technologies and equipment that we use purely as toys for the enjoyment and fulfillment of the IT department itself without consideration for the business.  The second is treating the business as generic instead of respecting that every business has unique needs and IT must adapt to the environment which it is in.

The first problem, the hobbyist problem, is the natural extension of the route through which most IT Professionals arrive in IT – they love working on computers and would do so on their own, at home, whether they were paid to do so or not.  This brings often a lifetime of “tech for the sake of tech” feeling to an IT Pro and is nearly universal in the field.  Few other professionals find themselves so universally drawn to what they do that they would do it paid or not.  But this shared experience creates a culture of often forgetting that the IT department exists within the context of a specific corporate entity or business unit and that its mandate exists only within that context.

The second problem stems, most likely, from broad IT and business training that focuses heavily on rules of thumb and best practices which, generally, require “common scenarios” as these are easy to teach by rote and leave out the difficult pieces of problem analysis and system design.  Custom tailoring not only solutions but IT thinking to the context of a specific business with specific needs is difficult and requires learning a lot about the business itself and a lot of thought to put IT into the context of the business specifically.

The fault does not necessarily lie with IT alone.  Business often treat their IT departments as nothing but hobbyists and focus far too heavily on technical and not business skills and often keep IT at arm’s length forgetting that IT has some of the most important business insight as they tend to cross all company boundaries.  IT needs deep access to business processes, workflows, plus planning and goals to be able to provide good advisement to the business but is often treated as if this information is not needed.  Businesses, especially smaller ones, tend to think of IT as a magic box with a set budget that money goes in and network plumbing comes out.  Print and radio ads promote this thinking.  IT as a product is poor business thinking.

In the defense of the business, IT operates in a way that few businesses are really prepared to handle.  IT is a cost center in that there is a base cost needed to keep any company functioning.  But beyond this, IT can be an opportunity center in most businesses, but this requires both IT and the business to work together to create these opportunities and even moreso to leverage them.

IT is often put in the inappropriate position of being forced to justify its own existence.  This is nonsensical as human resources, accounting, legal, management, janitorial, sales, marketing and production departments are never asked to demonstrate their financial viability.  Needing to do so puts an unfair strain on the IT department requiring non-business people to present business cases and wastes resources and hampers thinking in a vain attempt at producing pointless metrics.  This is a flaw in business thinking often caused by a rift between management and the people that they’ve hired to support them.  The relationship is often cold or even adversarial or cursory when it should be close and involved.  IT should be sitting at the decision table, it brings insight and it needs insight.

One of the biggest challenges that IT faces is that it is often in a position of needing to convince the business to do what is in the business’ own best interest.  This is, for the most part, a flaw in business thinking.  The business should not demand to stand in a position of doing the wrong thing and only be willing to do the right thing if it can be “sold” to them.  This is a fundamental flaw in approach.  It should be a process of good decision making, not starting from bad decision making unless being convinced otherwise.  Other departments are not presented with a similar challenge.  What other department regularly has to mount a campaign to request necessary resources?

Due to this challenge in constantly fighting for management attention and resources, IT needs to develop internal business skills in order to cope.  This is a reality of most IT departments today.  The ability not only to keep the business that they support in context and to make IT decisions based on this context but then be able to act as marketing and sales people taking these decisions and delivering them to the business in a manner similar to how outside vendors and salespeople would do is critical.  Outside vendors are sending skilled sales people and negotiators to the business in an attempt to do an end run around IT, IT needs the same skills (with the advantage of insider knowledge and the obvious advantage of having the best interest of the business) in order to demonstrate to the business why their solutions, opportunities and needs are important for consideration.

Having good interpersonal, writing and presentation skills is not enough, of course.  Knowing business context and leveraging it efficiently includes understanding factors such as risk, opportunity, loss, profit and being able to apply these to the relationship between the businesses’ IT investments and the bottom line.  Often IT Pros will be frustrated when the business is unwilling to invest in a solution that they present but forget that the business is considering (we hope) the total cost of ownership and the impact on the company’s bottom line.  When asked how the solution will save money or generate revenue, even indirectly, often, at best, the answers are vague and lack metrics.  Before going to the business with solutions, IT departments need to vet recommendations internally and ask tough questions like:

How does this solution save money today?  Or how does it make us more money?
How much money is it expected to save or earn?
What business problem are we attempting to solve? (What itch are we trying to scratch?)
What risks do we take on or reduce?

Or similar lines of thinking.  Instead of bringing technology to the business, bring solutions.  Identify problems or opportunity and present a case.  Role play and imagine yourself as a business owner disinterested in a solution.  Would you feel that the investment requested is a good one?  Too often we in IT like a solution because it is advanced, complex, “the right way to do it”, because another company is doing it, because it is the hot trend in IT and often we have very good reasons for wanting to bring these techniques or technologies into our workplace but forget that they may not apply or apply well to the business as it is and its financial capabilities or the business roadmap.

When I speak to IT professionals looking for advice on a system design or approach my first question is pretty universally: “What business need are you attempting to solve?”  Often this question is met with silence.  The business had not been considered in the selection of the solution being presented.  Regularly bringing requests or solutions to the business that do not take into consideration the context of the IT department within the business will rapidly train business decisions makers to distrust the advice coming from the IT department.  Not that they would feel that the advice is intentionally skewed but they, and often rightfully so, will suspect that the decisions are being brought forward from a technical basis alone and isolated from the concerns of the business.  Once this distrust is in place it is difficult to return to a healthier relationship.

Making the IT department continuously act within the context of the business that it serves, encouraging IT to pursue business skills and to approach the business for information and insight and making the business see IT as a partner and supporter with whom information must be shared and insight should be gathered can be a tall order.  The business is not likely to take the first step in improving the interaction.  It is often up to IT to demonstrate that it is considering the needs of the business, often moreso than the business itself, and considering the potential financial impact or benefit of its decisions and recommendations.  There is much to be gained from this process, but it is not an easy process

It is important to remember that the need for IT to keep business context is crucial, to some degree, for all members of the IT team, especially those making recommendations, but the ability to judge business need, understand high level workflows, understand financial ramifications, seek opportunity is a combination of IT management (CIO, Dir. of IT, etc.) and the IT department as a whole.  Many non-managerial technical members need not panic and feel that their lack of holistic business vision and acumen will keep them from adequately performing their role within the business context, but it does limit their ability to provide meaningful guidance to the business outside of extremely limited scopes.  Even common job roles, such as deskside support, need to have some understanding of the fiscal responsibilities of the IT department however, such as recognizing when the cost of fixing a low cost component may far exceed the cost of replacing the component with one that is new and, potentially, better.

Virtual Eggs and Baskets

In speaking with small business IT professionals, one of the key factors for hesitancy around deploying virtualization arises from what is described as “don’t put your eggs in one basket.”

I can see where this concern arises.  Virtualization allows for many guest operating systems to be contained in a single physical system which, in the event of a hardware failure, causes all guest systems residing on it to fail together, all at once.  This sounds bad, but perhaps it is not as bad as we would first presume.

The idea of the eggs and baskets idiom is that we should not put all of our resources at risk at the same time.  This is generally applied to investing, encouraging investors to diversify and invest in many different companies and types of securities like bonds, stocks, funds and commodities.  In the case of eggs (or money) we are talking about an interchangeable commodity.  One egg is as good as another.  A set of eggs are naturally redundant.

If we have a dozen eggs and we break six, we can still make an omelette, maybe a smaller one, but we can still eat.  Eating a smaller omelette is likely to be nearly as satisfying as a larger one – we are not going hungry in any case.  Putting our already redundant eggs into multiple baskets allows us to hedge our bets.  Yes, carrying two baskets means that we have less time to pay attention to either one so it increases the risk of losing some of the eggs but reduces the chances of losing all of the eggs.  In the case of eggs, a wise proposition indeed.  Likewise, a smart way to prepare for your retirement.

This theory, because it is repeated as an idiom without careful analysis or proper understanding, is then applied to unrelated areas such as server virtualization.  Servers, however, are not like eggs.  Servers, especially in smaller businesses, are rarely interchangeable commodities where having six working, instead of the usual twelve, is good enough.  Typically servers each play a unique role and all are relatively critical to the functioning of the business.  If a server is not critical then it is unlikely to be able to justify the cost of acquiring and maintaining itself in the first place and so would probably not exist.  When servers are interchangeable, such as in a large, stateless web farm or compute cluster, they are configured as such as a means to expanding capacity beyond the confines of a single, physical box and so fall outside the scope of this discussion.

IT services in a business are usually, at least to some degree, a “chain dependency.”  That is, they are interdependent and the loss of a single service may impact other services either because they are technically interdependent (such as a line of business application being dependent on a database) or because they are workflow interdependent (such as an office worker needing the file server working in order to provide a file which he needs to edit with information from an email while discussing the changes over the phone or instant messenger.)  In these cases, the loss of a single key service such as email, network authentication or file services may create a disproportionate loss of working ability.  If there are ten key services and one goes down, company productivity from an IT services perspective likely drops by far more than ten percent, possibly nearing one hundred percent in extreme cases.   This is not always true, in some unique cases workers are able to “work around” a lost service effectively, but this is very uncommon.  Even if people can remain working, they are likely far less productive than usual.

When dealing with physical servers, each server represents its own point of failure.  So if we have ten servers, we have ten times the likelihood of outage than if we had only one of those same servers.  Each server that we add brings with it its own risk.  If each failure has an outage factor of 2.5 – that is financially impacting the business for twenty five percent of revenue for, say, one day then our total average impact over a decade is the equivalent of two and a half total site outages.  I use the concept of factors and averages here to make this easy, determining the length of an average outage or impact of an average outage is not necessary as we only need to determine relative impact in this case to compare the scenarios.  It’s just a means of comparing cumulative outage financial impact of one event type compared to another without needing specific figures – this doesn’t help you determine what your spend should be, just relative reliability.

With virtualization we have the obvious ability to consolidate.  In this example we will assume that we can collapse all ten of these existing servers down into a single server.  When we do this we often trigger the “all our eggs in one basket” response.  But if we run some risk analysis we will see that this is usually just fear and uncertainty and not a mathematically supported risk.  If we assume the same risks as the example above our single server will, on average, incur just a single total site outage, once per decade.

Compare this to the first example which did the damage equivalent to two and a half total site outages – the risk of the virtualized, consolidated solution is only forty percent that of the traditional solution.

Now keep in mind that this is based on the assumption that losing some services means a financial loss greater than the strict value of the service that was lost, which is almost always the case.  Even if the service lost is no more than the loss of an individual service we are only at break even and need not worry.  In rare cases impact from losing a single system can be less than its “slice of the pie”, normally because people are flexible and can work around the failed system – like if instant messaging fails and people simple switch to using email until instant messaging is restore, but these cases are rare and are normally isolated to a few systems out of many with the majority of systems, say ERP, CRM and email, having disproportionally large impacts in the event of an outage.

So what we see here is that under normal circumstances moving ten services from ten servers to ten services on one server will generally lower our risk, not increase it – in direct contrast to the “eggs in a basket” theory.  And this is purely from a hardware failure perspective.  Consolidation offers several other important reliability factors, though, that can have a significant impact to our case study.

With consolidation we reduce the amount of hardware that needs to be monitored and managed by the IT department.  Fewer servers means that more time and attention can be paid to those that remain.  More attention means a better chance of catching issues early and more opportunity to keep parts on hand.  Better monitoring and maintenance leads to better reliability.

Possibly the most important factor, however, with consolidation is that there is significant cost savings and this, if approached correctly, can provide opportunities for improved reliability.  With the dramatic reduction in total cost for servers it can be tempting to continue to keep budgets tight and attempt to purely leverage the cost savings directly.   Understandable and for some businesses this may be the correct approach.  But it is not the approach that I would recommend when struggling against the notion of eggs and baskets.

Instead by applying a more moderate approach keeping significant cost savings but still spending more, relatively speaking, on a single server you can acquire a higher end (read: more reliable) server, use better parts, have on-site spares, etc.  The cost savings of virtualization can often be turned directly into increased reliability further shifting the equation in favor of the single server approach.

As I stated in another article, one brick house is more likely to survive a wind storm than either one or two straw houses.  Having more of something doesn’t necessarily make it the more reliable choice.

These benefits come purely from the consolidation aspect of virtualization and not from the virtualization itself.  Virtualization provides extended risk mitigation features separately as well.  System imaging and rapid restores, as well as restores to different hardware, are major advantages of most any virtualization platform.  This can play an important role in a disaster recovery strategy.

Of course, all of these concepts are purely to demonstrate that single box virtualization and consolidation can beat the legacy “one app to one server” approach and still save money – showing that the example of eggs and baskets is misleading and does not apply in this scenario.    There should be little trepidation in moving from a traditional environment directly to a virtualized one based on these factors.

It should be noted that virtualization can then extend the reliability of traditional commodity hardware providing mainframe-like failover features that are above and beyond what non-virtualized platforms are able to provide.  This moves commodity hardware more firmly into line with the larger, more expensive RISC platforms.  These features can bring an extreme level of protection but are often above and beyond what is appropriate for IT shops initially migrating from a non-failover, legacy hardware server environment.  High availability is a great feature but is often costly and very often unnecessary, especially as companies move from, as we have seen, relatively unreliable environments in the past to more reliable environments today.  Given that we have already increased reliability over what was considered necessary in the past there is a very good chance that an extreme jump in reliability is not needed now, but due to the large drop in the cost of high availability, it is quite possible that it will he cost justified where previously it could not be.

In the same vein, virtualization is often feared because it is seen as a new, unproven technology.  This is certainly untrue but there is an impression of this in the small business and commodity server space.  In reality, though, virtualization was first introduced by IBM in the 1960s and ever since then has been a mainstay of high end mainframe and RISC servers – those systems demanding the best reliability.  In the commodity server space virtualization was a larger technical challenge and took a very long time before it could be implemented efficiently enough to make it effective to use in the real world.  But even in the commodity server space virtualization has been available since the late 1990s and so is approximately fifteen years old today which is very far past the point of being a nascent technology – in the world of IT it is positively venerable.  Commodity platform virtualization is a mature field with several highly respected, extremely advanced vendors and products.  The use of virtualization as a standard for all or nearly all server applications is a long established and accepted “enterprise pattern” and one that now can easily be adopted by companies of any and every size.

Virtualization, perhaps counter-intuitively, is actually a very critical component of a reliability strategy.  Instead of adding risk, virtualization can almost be approached as a risk mitigation platform – a toolkit for increasing the reliability of your computing platforms through many avenues.