Category Archives: Business of IT

The Social Contract of Sales

In IT we tend to deal with more sales scenarios than most business positions will do.  An accountant, for example, is rarely in a position to buy equipment, software or products for their business, for example.  Positions that do buy things regularly, such as the housekeeping department, tend to buy small ticket items like bleach, window cleaner and garbage bags.  IT, however, tends to buy large cost items, with big margins, with great regularity making it have a need for understanding the world of sales and marketing far better than nearly any other department.

Because of this, understanding concepts like the Social Contract of Sales is far more critical for IT workers than for nearly anyone else outside of the business tiers even though this is just a general social contract that everyone in society is expected to understand intuitively and is just common sense.  But due to the very high danger of misconstruing this social contract in an IT context, and because IT workers are often hired with this specific area of competence ignore but then expected to work specifically around it heavily, we need to discuss it in this context.

The social contract is this: “Sales people represent a product or vendor, are compensated and to some degree obligated to push their product.  They cannot lie, but their intent is to convince.”

This should be ridiculously obvious, and yet there is an incredibly common belief that sales people will act against either their own self interest or the interest of their employer (which would be unethical) in order to act as a friend, adviser or possibly even engineer for customers.  This makes no sense.  Not only are they not paid to do this, they are specifically paid not to do this.   And there is the obvious social contract that tells everyone involved that they are sales people and no one should be surprised when they attempt to convince you to purchase whatever it is that they sell.

We have social or natural contracts like this all over the place and we need them to operate intelligently.  If you are walking in the woods and you meet a bear you have a natural contract with them that says if you try to touch them, they will try to eat you.  No one expects a bear to act differently from this and it is silly and pointless to hope that your interaction with a wild bear will be different from this.  But, you are free to test that contract.

The social contract of sales, or anything, does not make it ethical for a sales person to lie.  That would be an impossible situation.  But it is also considered to be part of the social contract that all sales, promotions and marketing only deal with the concept of “truth” when dealing with quantifiable factors and never qualifiable ones.

For example, a car salesman is always free to claim that their car is the nicest, prettiest, or most comfortable regardless if anyone believes that to be true.  But they are not free to lie about how many seats it has or the gas mileage.

Likewise, IT professionals both in house and paid advisers, have a social contract to represent their employers and to obviously protect them from sales that do not make sense.  Our professional has a responsibility in our handling of sales  people.  We are the gatekeepers.  No one else in the business has the expected ability to know when services or products are sensible or cost effective to meet our needs.  No one else is in a position where any contact with sales would make sense.

If we, as the IT gatekeepers, become confused as to the nature of the social contract and think that sales people are “on our side” looking out for our interests instead of their own or their employers, or we forget that only quantifiable facts are meaningful we can be easily misled – often by ourselves.  It is all too tempting to feel that sales people are there on our behalf, instead of their own.

A common sales tactic, that is incredibly effective against IT buyers, is the offer of free work.  IT decision making can be hard and, of course, sales people will happily take decision making off of our plates.  This is handy for them, as they can then make decisions that involve buying their services or products.  The decision to allow a sales person to do our jobs for us is a foregone conclusion to buy their products.  No one allowing a sales person to do this can make the reasonable claim that they had not made the decision to go with that vendor’s products at that point.

Doing this would, of course, violate our own social contract with our employers.  We are paid to do the IT work, to make the decisions, to make sure that sales people do not take advantage of the organization.  Handing our role over to the “enemy” that we are paid to protect against is exactly what our job role exists to prevent.  If our employers wanted sales people to simply sell the company whatever they wanted, they would eliminate the IT role and just talk to the sales people directly.  IT’s purpose instantly evaporates in that scenario.

Also within the social contract is that anyone that works on behalf of a vendor or a vendor representative (like a reseller) is a salesperson as well or, at the very least, partakes in the shares social contract.  They are employed to promote their products and have an obligation to do so even if their role is primarily technical, account management or whatever.  It is common for vendors to have employee positions with names like “presales engineer” or resellers to brand themselves “MSPs” to make it sound like they might be purely technical (and the implication of being “above” the sales world) or being customer representatives but neither is logically true.  Working for an organization that sells products, everyone who works there is a representative of those products.  Titles do not alter that social contract.

As IT Pros, it is our responsibility to understand and recognize the social contract of sales and to identify people who work for organizations that cause them to fall under the contract.  An ethical sales person cannot directly lie to us, but they will almost always happily allow us to lie to ourselves and that is one of the most powerful tools that they have.  We want them to be our friends, we want to be able to take it easy and let them do our job for us… and they will let us believe that all that we want.  But what we have to remember that as part of the assumptions of that social contract is that we know that this is how they are tasked with behaving and that it is our responsibility and no one else’s to ensure that we treat them like vendor agents and never confuse them with being our advisers.

When to Consider High Availability?

“High Availability isn’t something you buy, it’s something that you do.”  – John Nicholson

Few things are more universally desired in IT than High Availability (HA) solutions.  I mean really, say those words and any IT Pro will instantly say that they want that.  HA for their servers, their apps, their storage and, of course, even their desktops.  If there was a checkbox next to any system that simply said “HA”, why wouldn’t we check it?  We would, of course.  No one voluntarily wants a system that fails a lot.  Failure bad, success good.

First, though, we must define HA.  HA can mean many things.  At a minimum, HA must mean that the availability of the system in question must be higher than “normal”.  What is normal?  That alone is hard enough to define.  HA is a loose term, at best.  In the context of its most common usage, though, which is common applications running on normal enterprise hardware I would offer this starting point for HA discussions:

Normal or Standard Availability (SA) would be defined as the availability from a common mainline server running a common enterprise operating system running a common enterprise application in a best practices environment with enterprise support.  Good examples of this might include Exchange running on Windows Server running on the HP Proliant DL380 (the most common mainline commodity server.)  Or for BIND (the DNS server) running on Red Hat Enterprise Linux on the Dell PowerEdge R730.  These are just examples to be used for establishing a rough baseline.  There is no great way to measure this, but with a good support contract and rapid repair or replacement in the real world, reliability of a system of this nature is believed to be between four and five nines of reliability (99.99% uptime or higher) when human failure is not included.

High Availability (HA) should be commonly defined as having an availability significantly higher than that of Standard Availability.  Significantly higher should be a minimum of one order of magnitude hincrease.  So at least five nines of reliability and more likely six nines. (99.9999% uptime.)

Low Availability (LA) would be commonly defined as having an availability significantly lower than that of Standard Availability with significantly, again, meaning at least one order of magnitude.  So LA would typically be assumed to be around 99% to 99.9% or lower availability.

Measurement here is very difficult as human factors, environmental and other play a massive role in determining the uptime of different configurations.  The same gear used in one role might achieve five nines while in another fail to achieve even one.  The quality of the datacenter, skill of the support staff, rapidity of parts replacement, granularity of monitoring and a multitude of other factors will affect the overall reliability significantly.  This is not necessarily a problem for us, however.  In most cases we can evaluate the portions of a system design that we control in such a way that relative reliability can be determined so that at least we can show that one approach is going to be superior to another in order that we can then leverage well informed decision making even if accurate failure rate models cannot be easily built.

It is important to note that other than providing a sample baseline set of examples from which to work there is nothing in the definitions of high availability or low availability that talk about how these levels should be achieved – that is not what the terms mean.  The terms are resultant sets of reliability in relation to the baseline and nothing else. There are many ways to achieve high availability without using commonly assumed approaches and practically unlimited ways to achieve low availability.

Of course HA can be defined at every layer.  We can have HA platforms or OS but have fragile applications on top.  So it is very important to understand at what level we are speaking at any given time.  At the end of the day, a business will only care about the high availability delivery of services regardless of how it is achieved, or where.  The end result is what matters not the “under the hood” details of how it was accomplished or, as always, the ends justify the means.

It is extremely common today for IT departments to become distracted by new and flashy HA tools at the platform layer and forget to look for HA higher and lower in the stack to ensure that we provide highly available services to the business; rather than only looking at the one layer while leaving the business just as vulnerable, or moreso, than ever.

In the real world, though, HA is not always an option and, when it is, it comes at a cost.  That cost is almost always monetary and generally comes with extra complexity as well.  And as we well know, any complexity also carries additional risk and that risk could, if we are not careful, cause an attempt to achieve HA actually fail and might even leave us with LA or Low Availability.

Once we understand this necessary language for describing what we mean, we can begin to talk about when high availability, standard availability and even low availability may be right for us.  We use this high level of granularity because it is so difficult to measure system reliability that getting too detailed becomes valueless.

Conceptually, all systems come with risk of downtime and nothing can be always up, that’s impossible.  Reliability costs money, generally, all other things being equal.  So to determine what level of availability is most appropriate for a workload we must determine the cost of risk mitigation (the amount of money that it takes to change the average amount of downtime) and compare that against the cost of the downtime itself.

This gets tricky and complicated because determining cost of downtime is difficult enough, then determining the risk of downtime is even more difficult.  In many cases, downtime is not a flat number, but it might be.  This cost could be expressed as $5/minute or $20K/day or similar.  But an even better tool would be to create a “loss impact curve” that shows how money is lost over time (within a reasonable interval.)

For example, a company might easily face no loss at all for the first five minutes with slowly increasing, but small, losses until about four hours when work stops because people can no longer go to paper or whatever and then losses go from almost zero to quite large.  Or some companies might take a huge loss the moment that the systems are down but the losses slowly dissipate over time.  Loss might only be impactful at certain times of day.  Maybe outages at night or during lunch are trivial but mid morning or mid afternoon are major.  Every company’s impact, risk and ability to mitigate that risk are different, often dramatically so.

Sometimes it comes down to the people working at the company.  Will they all happily take needed bathroom, coffee, snack or even lunch breaks at the time that a system fails so that they can return to work when it is fixed?  Will people go home early and come in early tomorrow to make up for a major outage?  Is there machinery that is going to sit idle?  Will the ability to respond to customers be impacted?  Will life support systems fail?  There are countless potential impacts and countless potential ways of mitigating different types of failures.  All of this has to be considered.  The cost of downtime might be a fraction of corporate revenues on a minute by minute basis or downtime might cause a loss of customers or faith that is more impactful than the minute by minute revenue generated.

Once we have some rough loss numbers to deal with we at least have a starting point.  Even if we only know that revenue is ~$10/minute and losses are expected to be around ~$5/minute we have a starting point of sorts.  If we have a full curve or a study done with some more detailed numbers, all the better.  Now we need to figure out roughly what our baseline is going to be.  A well maintained server, running on premises, with a good support contract and good backup and restore procedures can pretty easily achieve four nines of reliability.  That means that we would experience about five hours of downtime every five years.  This is actually less than the generally expected downtime of SA in most environments and potentially far less than expected levels in excellent environments like high quality datacenters with nearby parts and service.

So, based on our baseline example of about five hours every five years we can figure out our potential risk.  If we lose about ~$5/minute and we expect roughly 300 minutes of downtime every five years we looking at a potential loss of $1,500 every half decade.

That means that at the most extreme we could never spend $1,500 to mitigate that risk, that would be financially absurd.  This happens for several reasons.  One of the biggest is that this is only a risk, spending $1,500 to protect against losing $1,500 makes little sense, but it is a very common mistake to make when people do not analyze these numbers carefully.

The biggest factor is that any mitigation technique is not completely effective.  If we manage to move our four nines system to a five nines system we would reduce only 90% of the average downtime moving us from $1,500 of loss to $150 of loss.  If we spent $1,500 for that reduction, the total “loss” would still be $1,650 (the cost of protection is a form of financial loss.)  The cost of the risk mitigation combined with the anticipated remaining impact when taken together must still be lower than the anticipated cost of the risk without mitigation or else the mitigation itself is pointless or actively damaging.

Many may question why the total cost of risk mitigation must be lower and not simply equal as surely, that must mean that we are at a “risk break even” point?  This seems true on the surface, but because we are dealing with risk this is not the case.  Risk mitigation is a certain cost- financial damage that we take up front in the hopes of reducing losses tomorrow.  But the risk for tomorrow is a guess, hopefully a well educated one, but only a guess.  The cost today is certain.  Taking on certain damage today in the hopes of reducing possible damage tomorrow only makes sense when the damage today is small and the expected or possible damage tomorrow is very large and the effectiveness of mitigation is significant.

Included in the idea of “certain cost of front” to reduce “possible cost tomorrow” is the idea of the time value of money.  Even if an outage was of a known size and time, we would not spend the same money today to mitigate it tomorrow because our money is more valuable today.

In the most dramatic cases, we sometimes see IT departments demanding tens or hundreds of thousands of dollars be spent up front to avoid losing a few thousand dollars, maybe, sometime maybe many years in the future.  A strategy that we can refer to as “shooting ourselves in the face today to avoid maybe getting a headache tomorrow.”

It is included in the concept of evaluating the risk mitigation but it should be mentioned specifically that in the case of IT equipment there are many examples of attempted risk mitigation that may not be as effective as they are believed to be.  For example, having two servers that sit in the same rack will potentially be very effective for mitigating the risk of host hardware failure, but will not mitigate against natural disasters, site loss, fire, most cases of electrical shock, fire suppression activation, network interruptions, most application failure, ransomware attack or other reasonably possible disasters.

It is common for storage devices to be equipment with “dual controllers” which gives a strong impression of high reliability, but generally these controllers are inside a single chassis with shared components and even if the components are not shared, often the firmware is shared and communications between components are complex; often leading to failures where the failure of one component triggers the failure of another – making them quite frequently LA devices rather than SA or the HA that people expected when purchasing them.  So it is very critical to consider if the risk mitigation strategy will mitigate which risks and if the mitigation technique is likely to be effective.  No technique is completely effective, there is always a chance for failure, but some strategies and techniques are more broadly effective than others and some are simply misleading or actually counter productive.  If we are not careful, we may implement costly products or techniques that actively undermine our goals.

Some techniques and products used in the pursuit of high availability are rather expensive, which might include buying redundant hardware, leasing another building, installing expensive generators or licensing special software.  There are low cost techniques and software as well, but in most cases any movement towards high availability will result in a respectively large outlay of investment capital in order to achieve it.  It is absolutely critical to keep in mind that high availability is a process, there is no way to simply buy high availability.  Achieving HA requires good documentation, procedures, planning, support, equipment, engineering and more.  In the systems world, HA is normally approached first from an environmental perspective with failover power generators, redundant HVAC systems, power conditioning, air filtration, fire suppression systems and more to ensure that the environment for the availability is there.  This alone can often make further investment unnecessary as this can deliver incredible results.  Then comes HA system design ensuring that not just one layer of a technology stack is highly available but that the entire stack is allowing for the critical applications, data or services to remain functional during as much time as possible.  Then looking at site to site redundancy to be able to withstand floods, hurricanes, blizzards, etc.  Of course there are completely different techniques such as utilizing cloud computing services hosted remotely on our behalf.  What matters is that high availability requires broad thinking and planning, cannot simply be purchased as a line item and is judged by the ability to return a risk factor providing a resulting uptime or likelihood of uptime much higher than a “standard” system design would deliver.

What is often surprising, almost shocking, to many businesses and especially to IT professionals, who rarely undertake financial risk analysis and who are constantly being told that HA is a necessity for any business and that buying the latest HA products is unquestionably how their budgets should be spent, is that when the numbers are crunched and the reality of the costs and effectiveness of risk mitigation strategies are considered that high availability has very little place in any organization, especially those that are small or have highly disparate workloads.  In the small and medium business market it is almost universal to find that the cost and complexity (which in turn brings risk, mostly in the form of a lack of experience around techniques and risk assessment) of high availability approaches is far too costly to ever offset the potential damage of the outage from which the mitigation is hoped to protect.  There are exceptions, of course, and there are many businesses for which high availability solutions are absolutely sensible, but these are the exception and very far from being the norm.

It is also sensible to think of the needs for high availability to be based on a workload basis and not department, company or technology wide.  In a small business it is common for all workloads to share a common platform and the need of a single workload for high availability may sweep other, less critical, workloads along with it.  This is perfectly fine and a great way to offset the cost of the risk mitigation of the critical workload through ancillary benefit to the less critical workloads.  In a larger organization where there is a plethora of platform approaches used for differing workloads it is common for only certain workloads that are both highly critical (in terms of risk from downtime impact) and that are practically mitigated of risk (the ability to mitigate risk can vary dramatically between different types of workloads) to have high availability applied to them and other workloads to be left to standard techniques.

Examples of workloads that may be critical and can be effectively addressed with high availability might be an online ordering system where the latency created by multi-regional replication has little impact on the overall system but losing orders could be very financially impactful should a system fail.  An example of a workload where high availability might be easy to implement but ineffectual would be an internal intranet site serving commonly asked HR questions; it would simply not be cost effective to avoid small amounts of occasional downtime for a system like this.  An example of a system where risk is high but the cost or effectiveness of risk mitigation makes it impractical or even impossible might be a financial “tick” database requiring massive amounts of low latency data to be ingested and the ability to maintain a replica would not only be incredibly costly but could introduce latency that would undermine the ability of the system to perform adequately.  Every business and workload is unique and should be evaluated carefully.

Of course high availability techniques can be actioned in stages; it is not an all or nothing endeavor.  It might be practical to mitigate the risk of system level failure by having application layer fault tolerance to protect against failure of system hardware, virtualization platform or storage.  But for the same workload it might not be valuable to protect against the loss of a single site.  If a workload only services a particular site or is simply not valuable enough for the large investment needed to make it fail between sites it could easily fall “in the middle.”  It is very common for workloads to only implement partially high availability solutions, often because an IT department may only be responsible for a portion of them and have no say over things like power support and HVAC, but probably most common because some high availability techniques are seen as high visibility and easy to sell to management while others, such as high quality power and air conditioning, often are not even though they may easily provide a better bang for the buck.  There are good reasons why certain techniques may be chosen and not others as they affect different risk components and some risks may have a differing impact on an individual business or workload.

High availability requires careful thought as to whether it is worth considering and even more careful thought as to implementation.  Building true HA systems requires a significant amount of effort and expertise and generally substantial cost.  Understanding which components of HA are valuable and which are not requires not just extensive technical expertise but financial and managerial skills as well.  Departments must work together extensively to truly understand how HA will impact an organization and when it will be worth the investment.  It is critical that it be remembered that the need for high availability in an organization or for a workload is anything but a foregone conclusion and it should not be surprising in the least to find that extensive high availability or even casual high availability practices turn out to be economically impractical.

In many ways this is because standard availability has reached such a state that there is continuously less and less risk to mitigate.  Technology components used in a business infrastructure, most notably servers, networking gear and storage, have become so reliable that the amount of downtime that we must protect against is quite low.  Most of the belief in the need for knee jerk high availability comes from a different era when reliable hardware was unaffordable and even the most expensive equipment was rather unreliable by modern standards.  This feeling of impending doom that any device might fail at any moment comes from an older era, not the current one.  Modern equipment, while obviously still carrying risks, is amazingly reliable.

In addition to other risks, over-investing in high availability solutions carries financial and business risks that can be substantial.  It increases technical debt in the face of business uncertainty.  What if the business suddenly grows, or worse, what if it suddenly contracts, changes direction, gets purchased or goes out of business completely?  The investment in the high availability is already spent even if the need for its protection disappears.  What if technology or location change?  Some or all of a high availability investment might be lost before it would have been at its expected end of life.

As IT practitioners, evaluating the benefits, risks and costs of technology solutions is at the core of what we do.  Like everything else in business infrastructure, determining the type of risk mitigation, the value of protection and how much is financially proper is our key responsibility and cannot be glossed over or ignored.  We can never simply assume that high availability is needed, nor that it can simply be skipped.  It is in analysis of this nature that IT brings some of its greatest value to organizations.  It is here that we have the potential to shine the most.

 

 

Rethinking Long Term Support Releases

Traditionally Long Term Support operating system releases have been the bulwark of enterprise deployments.  This is the model used by IBM, Oracle, Microsoft, Suse and Red Hat and has been the conventional thinking around operating systems since the beginning of support offerings many decades ago.

It has been common in the past for both servers and desktop operating system releases to follow this model, but in the Linux space specifically we began to see this get shaken up where less formal products were free to experiment with more rapid, unsupported or simply unstructured releases.  In the primary product space, openSuse, Fedora and Ubuntu all provided short term support offerings or rapid release offerings. Instead of release cycles measured in years and support cycles closing in on a decade they shorted release cycles to months and support to just months or a few years at most.

In the desktop space, getting new features and applications sooner, instead of focusing primarily on stability as was common on servers, often made sense and brought the added benefit that new technologies or approaches could be tested on faster release cycle products before being integrated into long term support server products.  Fedora, for example, is a proving ground for technologies that will, after proving themselves, make their way into Red Hat Enterprise Linux releases.  By using Fedora, end users get features sooner, get to learn about RHEL technologies earlier and Red Hat gets to test the products on a large scale before deploying to critical servers.

Over time the stability of short term releases has improved dramatically and increasingly these systems are seen as viable options for server systems.  These systems get newer enhancements, features and upgrades sooner which is often seen as beneficial.

A major benefit of any operating system is their support ecosystem, including the packages and libraries that are supported and provided as part of the base operating system.  With long term releases, we often see critical packages aging dramatically throughout the life of the release which can cause problems with performance, compatibility and even security in extreme cases.  This obviously forces users of long term release operating systems to choose between continuing to live with the limitations of the older components or to integrate new components themselves which often breaks the fundamental value of the long term release product.

Because the goal of a long term release is to have stability and integration testing, replacing components within the product to “work around” the limitations of an LTS means that those components are not being treated in an LTS manner and that integration testing from the vendor is no longer happening, most likely, or if it is, not to the same degree.  In effect, what happens is that this becomes a self-built short term release product but with legacy core components and less oversight.

In reality, in most respects, doing this is worse than going directly to a short term release product.  Using a short term or rapid release product allows the vendor to maintain the assumed testing and integration, just with a faster release and support cycle, so that the general value of the long term release concept is maintained and with all components of the operating system, rather than just a few, being updated.  This allows for more standardization, industry testing and shared knowledge and integration than with a partial LTS model.

Maybe the time has come to rethink the value of long term support for operating systems.  For too long, it seems, the value of this approach was simply assumed and followed, and certainly it had and has merits; but the operating system world has changed since this approach was first introduced.  The need for updates has increased while the change rates of things like kernels and libraries have slowed dramatically.  More powerful servers have moved compatibility higher up the stack and instead of software being written to an OS it is often written for a specific version of a language or run time or other abstraction layer.

Shorter release cycles means that systems get features, top to bottom, more often.  Updates between “major” releases are smaller and less impactful.  Changes from updates are more incremental, providing a more organic learning and adaptation curve.  And most importantly the need for replacing system components that are carefully tested and integrated with third party provided versions becomes, effectively, unheard of.

Stability for software vendors remains a value for long term releases and will cause there to be a need for the use of long term releases for a long time to come.  But for the system administrator, the value to this approach seems to be decreasing and, I feel personally, has found an inflection point in recent years.  It used to seem expected and normal to wait two or three years for packages to be updated, but today this feels unnecessarily cumbersome.  It seems increasingly common that higher level components are built with a requirement of newer underlying components; an expectation that operating systems will either be more current or that portions of the OS will be updated separately from the rest.

A heavy reliance on containerization technologies may reverse this trend in some ways, but in ways that always reduce the value of long term releases at the same time.  Containerization reduces the need for extensive capabilities in the base operating system making it easier and more effective to update more frequently for improved kernel, filesystem, driver and container support while leaving libraries and other dependencies in the containers allowing applications that need long term support dependencies to be met in that way and applications that can benefit from newer components to be addressed in that manner.

Of course virtualization has played a role in reducing the value of long term support models by making rapid recovery and duplication of systems trivial.  Stability that we’ve needed long term support releases to address is partially addressed by the virtualization layer; hardware abstraction improves driver stability in very important ways.  In the same vein, devops style support models also reduce the need for long term support and make server ecosystems more agile and flexible.  Trends in system administration paradigms are tending to favour more modern operating systems.

Time will tell if trends continue in the direction that they are headed.  For myself, this past year has been an eye opening one that has seen me move my own workloads from a decade of staunch support for very long term support products to rapid release ones and I must say, I am very happy with the change.

All IT is External

In IT we often talk about internal and external IT, but this perspective is always one from that of the IT department itself rather than the one from the business and I feel that this is very misleading.   Different departments within a company are generally seen and feel as if they are external to one another; often every bit as much as an external company feels.  For example, an IT department will often see management, operations or human resources as “foreign” departments at best and adversaries at worst.  It is common to feel, and possibly rightfully so, that different departments fail to even share common overarching goals.  IT tends to be acutely aware of that and expresses it often.

What we need to appreciate is that to the business management or owners, the IT department generally appears to them like an external agencies regardless of whether the people working in it are staff or actually from a service provider.  There are exceptions to this, of course, but they are rare.  IT is generally held behind a barrier of sorts and is its own entity. IT commonly displays this in how it talks to or about management.  IT often thinks of system resources or the network as “belonging to IT”, clearly not thinking in terms of IT being just part of the company.  Both sides are commonly guilty of thinking of IT as a separate entity from the company itself.

This happens, of course, for any number of reasons.  Many IT workers choose IT because they are passionate about IT specifically, not the company or market that they are working in; their loyalty is to their IT career, not the business in question and would generally switch companies to advance their IT career rather than stay to advance their internal non-IT career.  IT professionals often struggle with interpersonal skills and so have a higher than average tendency to hide away avoiding unnecessary contact with other departments.  IT tends to be busy and overworked, making socializing problematic.  IT work demands focus and availability, again making it difficult to socialize and interface with other departments.  IT is often kept isolated for security reasons and IT is often seen as the naysayer of the organization – commonly delivering bad news or hindering projects.  IT typically has extremely high turnover rates and almost no IT staff, especially in smaller businesses, is expected to be around for the long haul.  IT is often a conduit to outside vendors and is seen as connected to them or associated with them in many ways.  IT is often behind a “blame barrier” where the organization (other than IT) on one side often seeks to blame IT for business decisions creating a stronger “us and them” mentality. IT exacerbates this with attitudes towards users and decisions makers that are often distancing.  It is also extremely common for IT workers to be staffed via an agency in such a way that there are contract obligations, restrictions or payrolls differences between IT and normal staff.

This creates a rather difficult situation for discussions involving the advantages of internal IT versus external IT.  For internal IT staff it is common to believe that by having IT internally that there are many benefits to the organization due to loyalty, closeness or the ties of payroll.  But is this really the case?

To the business, internal IT is already, in most cases, external to their organization.  The fears that are often stated about external IT service provides such that they may not work in the business’ interests, may suddenly close up shop and disappear, might be overworked and not have enough available resources, may charge for work when idle, may not have the needed expertise, may see the network and resources as their own and not act in the interests of the business, may fail to document the systems or might even hold critical access hostage for some reason – are all fears that businesses have about their own IT departments exactly the same as they have them with external IT service providers.

In fact, external service providers often provide a business with more legal recourse than employees do.  For example, internal IT employees can quit with zero notice and only suffer from acting “unprofessionally” in their lack of notice or can give only two weeks notice and not even have to worry about being unprofessional.  Yet replacing internal IT staff of any caliber will easily take months, and that is just before one can be hired let alone trained, indoctrinated and brought up to useful speed.  It is not uncommon, even in the enterprise, for a job search, hiring process and internal processes for access and so forth to take up to a year from the time the decision to begin interviewing has started until someone is a useful staff member.  But an external IT service provider may be obligated to provide resources for coverage regardless of if staff comes and goes.  There are far more possibilities for mitigating the staff turnover risks that employed IT staff present to a business.

Due to these factors, it is very common for a business to perceive both internal and external IT resources as roughly equal and primarily such that both are very much outsiders to the key organization.  Of course, in an ideal world, both would be treated very much as insiders and worked with as critical partners for planning, decision making, triage and so forth.  IT is critical to business thinking and the business is critical to IT thinking; neither is really functional without the other.

This context of the organizational management view of IT can be important for understanding how the business will react to IT as well as how IT should behave with management.  And it offers an opportunity for both to work on coming together, whether IT is ultimately internal or external, to behave more like a singular organization with a unified goal.