All posts by Scott Alan Miller

Started in software development with Eastman Kodak in 1989 as an intern in database development (making database platforms themselves.) Began transitioning to IT in 1994 with my first mixed role in system administration.

Making the Best of Your Inverted Pyramid of Doom

The 3-2-1 or Inverted Pyramid of Doom architecture has become an IT industry pariah for many reasons. Sadly for many companies, they only learn about the dangers associated with this design after the components have arrived and the money has left the accounts.

Some companies are lucky and catch this mistake early enough to be able to return their purchases and start over with a proper design and decision phase prior to the acquisition of new hardware and software. This, however, is an ideal and very rare situation. At best we can normally expect restocking fees and, far more commonly, the equipment cannot be returned at all or the fees are so large as to make it pointless.

What most companies face is a need to “make the best” of the situation moving forward. One of the biggest concerns is that concerned parties, whether it be the financial stake holders who have just spent a lot of money on the new hardware or if it is the technical stakeholders who now look bad for having allowed this equipment to be purchased, to succumb to an emotional reaction resulting in giving in to the sunk cost fallacy. It is vital that this emotional, illogical reaction not be allowed to take hold as it will undermine critical decision making.

It must be understood that the money spent on the inverted pyramid of doom has already been spent and is gone. That the money was wasted or how much was wasted is irrelevant to decision making at this point. If the system was a gift or if it cost a billion dollars does not matter, that money is gone and now we have to make do with what we have. A potential “trick” here would be to bring in a financial decision maker like a CFO, explain that there is about to be an emotional reaction to money already spent and discuss the sunk cost fallacy before talking about the actual problem so that people are aware and logical and the person trained (we hope) to best handle this kind of situation is there and ready to head off sunk cost emotions. Careful handling of a potentially emotionally-fueled reaction is important. This is not the time to attempt to cover up either the financial or the technical missteps, which is what the emotional reaction is creating. It is necessary for all parties to communicate and remain detached and logical in order to address the needs. Some companies handle this well, many do not and become caught trying to forge forward with bad decisions that were already made, probably in the hopes that nothing bad happens and that no one remembers or notices. Fight that reaction. Everyone has it, it is the natural amygdala “fight or flight” emotional response.

Now that we are ready to fight the emotional reactions to the problem we can begin to address “where do we go from here.” The good news is that where we are is generally a position of having “too much” rather than “too little.” So we have an opportunity to be a little creative. Thankfully there are generally good options that can allow us to move in several directions.

One thing that is very important to note is that we are looking at solutions exclusively that are more reliable, not less reliable, than the intended inverted pyramid of doom architecture that we are replacing. An IPOD is a very fragile and dangerous design and we could go to great lengths demonstrating concepts like risk analysis, single points of failure, the fallacies of false redundancy, looking at redundancy instead of reliability, dependency chains, etc. but what is absolutely critical for all parties to understand is that a single server, running with local storage is more reliable than the entire IPOD infrastructure would be. This is so important that it has to be said again: if a single server is “standard availability”, the IPOD is lower than that. More risky. If anyone at this stage fears a “lack of redundancy” or a “lack of complexity” in the resulting solutions we have to come back to this – nothing that we will discuss is as risky as what had already been designed and purchased. If there is any fear of risk going forward, the fear should have been greater before we improved the reliability of the design. This cannot be overstated. IPODs sell because they easily confuse those not trained in risk analysis and look reliable when, in fact, they are anything but.

Understanding the above and using a technique called “reading back” the accepted IPOD architecture tells us that the company in question was accepting of not having high availability (or even standard availability) at the time of purchasing the IPOD. Perhaps they believed that they were getting that, but the architecture could not provide it and so moving forward we have the option of “making do” with nothing more than a single server, running on its own local storage. This is simple and easy and improves on nearly every aspect of the intended IPOD design. It costs less to run and maintain, is often faster and is much less complex while being slightly more reliable.

But likely simply dropping down to a single server and hoping to find uses for the rest of the purchased equipment “elsewhere” is not going to be our best option. In situations where the IPOD had been meant to only be used for a single workload or set of workloads and other areas of the business have need for equipment as well it can be very beneficial to go to the “single server” approach for the intended IPOD workload and utilize the remaining equipment elsewhere in the business.

The most common approach to take with repurposing an IPOD stack is to reconfigure the two (or more) compute nodes to be full stack nodes containing their own storage. This step may require no purchases, depending on what storage has already been purchased, a movement of drives between systems or often the relatively small purchase of additional hard drives for this purpose.

These nodes can then be configured into one of two high availability models. In the past a common design choice, for cost reasons, was to use an asynchronous replication model (often known as the Veeam approach) that will replicate virtual machines between the nodes and allow VMs to be powered up very rapidly allowing for a downtime from the moment of compute node failure until recovery of as little as just a few minutes.

Today fully synchronous fault tolerance is available so commonly for free that it has effectively replaced the asynchronous model in nearly all cases. In this model storage is replicated in fully real time between the compute nodes allowing for failover to happen instantly, rather than with a few minutes delay, and with zero data loss instead of a small data loss window (e.g. RPO of zero.)

At this point it seems to be common for people to react to replication with a fear of a loss of storage capacity caused by the replication. Of course this is true. It is necessary that it be understood that it is this replication, missing from the original IPOD design, that provides the firm foundation for high reliability. If this replication is skipped, high availability is an unobtainable dream and individual compute nodes using local storage in a “stand alone” mode is the most reliable potential option. High availability solutions rely on replication and redundancy to build the necessary reliability to qualify for high availability.

This solves the question of what to do with our compute nodes but leaves us with what we can do with our external shared storage device, the single point of failure or the “point” of the inverted pyramid design. To answer this question we should start by looking at what this storage might be.

There are three common types of storage devices that would be used in an inverted pyramid design: DAS, SAN and NAS. We can lump DAS and SAN together as they are both two different aspects of block storage and can be used essentially interchangeably in our discussion – they are only differentiated by the existence of switching which can be added or removed as needed in our designs. NAS differs by being file storage rather than block storage.

In both cases, block (DAS or SAN) or file (NAS) storage one of the most common usages for this now superfluous device is as a backup target for our new virtualization infrastructure. In many cases the device may be overkill for this task, generally with more performance and many more features than needed for a simple backup target but good backup storage is important for any critical business infrastructure and erring on the side of overkill is not necessarily a bad thing. Businesses often attempt to skimp on their backup infrastructures and this is an opportunity to invest heavily in it without spending any extra money.

Along the same vein as backup storage, the external storage device could be repurposed as archival storage or other “lower tier” of storage where high availability is not warranted. This is a less common approach, generally because every business needs a good backup system but only some have a way to leverage an archival storage tier.

Beyond these two common and universal storage models, a common use case for external storage devices, especially if the device is a NAS, is to leverage it in its native rule as a file server separate from the virtualization infrastructure. For many businesses file serving is not as uptime critical as the core virtualization infrastructure and backups are far easier to maintain and manage. By offloading file serving to an already purchased NAS device this can reduce file serving requirements from the virtualization infrastructure both by reducing the number of VMs that need to be run there as well as moving what is typically one of the largest users of storage to a separate device which can lower the performance requirements of the virtualization infrastructure as well as its capacity requirements. By doing this we potentially reduce the cost of obtaining necessary additional hard drives for the local storage on the compute nodes as we stated earlier and so this can be a very popular method for many companies to address the repurposing needs.

Every company is unique and there are potentially many places where spare storage equipment could be effectively used from labs to archives to tiered storage. Using a little creativity and thinking outside of the box can be leveraged to take your unique set of available equipment and your business’ unique set of needs and demands and find the best place to use this equipment where it is decoupled from the core, critical virtualization infrastructure but can still bring value to the organization. By avoiding the inverted pyramid of doom we can obtain the maximum value from the equipment that we have already invested in rather than implementing fresh technical debt that we have to them work to overcome unnecessarily.

Why We Avoid Contract to Hire

Information Technology workers are bombarded with “Contract to Hire” positions, often daily.  There are reasons why this method of hiring and working is fundamentally wrong and while workers immediately identify these positions as bad choices to make, but few really take the time to move beyond emotional reaction to understand why these working method is so flawed and, more importantly, few companies take the time to explore why using tactics such as this undermine their staffing goals.

To begin we must understand that there are two basic types of technology workers: consultants (also called contractors) and permanent employees (commonly known as the FTEs.)  Nearly all IT workers fall into a desire to be one of these two categories. Neither is better or worse, they are simply two different approaches to employment engagements and represent differences in personality, career goals, life situations and so forth.  Workers do not always get to work they way that they desire, but basically all IT workers seek to be in either one camp or the other.

Understanding the desires and motivations of IT workers seeking to be full time employees is generally very easy to do.  Employees, in theory, have good salaries, stable work situations, comfort, continuity, benefits, vacations, protection and so forth.  At least this is how it seems, whether these aspects are real or just illusionary can be debated elsewhere.  What is important is that most people understand why people want to be employees, but the opposite is rarely true.  Many people lack the empathy for those seeking to not be employees.

Understanding professional or intentional consultants can be difficult.  Consultants live a less settled life but generally earn higher salaries and advance in their careers faster, see more diverse environments, get a better chance to learn and grow, are pushed harder and have more flexibility.  There are many factors which can make consulting or contracting intentionally a sensible decision.  Intentional contracting is very often favored by younger professionals looking to grow quickly and gain experience that they otherwise could not obtain.

What makes this matter more confusing is that the majority of workers in IT wish to work as full time employees but a great many end up settling for contract positions to hold them over until a desired full time position can be acquired.  The commonality of this situation has created a situation wherein a great many people both inside and outside of the industry and on both sides of the interview table may mistakenly believe that all cases are this way and that consulting is a lower form of employment.  This is completely wrong.  In many cases consulting is highly desired and contractors can benefit greatly for their choice of engagement methodology.  I, myself, spent most of my early career, around fifteen years, seeking only to work as a contractor and had little desire to land a permanent post.  I wanted rapid advancement, opportunities to learn, chances to travel and variety.

It is not uncommon at all for the desired mode of employment to change over time.  It is most common for contractors to seek to move to full employment at some point in their careers. Contracting is often exhausting and harder to sustain over a long career.  But certainly full time employees sometimes chose to move into a more mobile and adventurous contracting mode as well.  And many choose to only work one style or the other for the entirety of their careers.

Understanding these two models is key.  What does not fit into this model is the concept of a Contract to Hire.  This hiring methodology starts by hiring someone willing to work a contract position and then, sometimes after a set period of time and sometimes after an indefinite period of time, either promises to make a second determination to see if said team member should be “converted” into an employee, or let go.  This does not work well when we attempt to match it up against the two types of workers.  Neither type is a “want to start as one thing and then do another”.  Possibly somewhere there is an IT worker who would like to work as a contractor for four months and then become an employee, getting benefits but only after a four month delay, but I am not aware of such a person and it is reasonable to assume that if there is such a person he is unique and already has done this process and would not want to do it again.

This leaves us with two resulting models to match into this situation.  The first is the more common model of an IT worker seeking permanent employment and being offered a Contract to Hire position.  For this worker the situation is not ideal, the first four months represent a likely jarring and complex situation and a scary one that lacks the benefits and stability that is needed and the second decision point as to whether to offer the conversion is very scary.  The worker must behave and plan as if there was no conversion and must be actively seeking other opportunities during the contract period, opportunities that are pure employment from the beginning.  If there was any certainty of a position becoming a full employment one then there would be no contract period at all.  The risk is exceptionally high to the employee that no conversion will be offered.  In fact, it is almost unheard of in the industry for this to happen.

It must be noted that, for most IT professionals, the idea that a Contract to Hire will truly offer a conversion at the end of the contract duration is so unlikely that it is generally assumed that the enticement of the conversion process is purely a fake one and that there is no possibility of it happening at all.  And for reasons we will discover here it is obvious why companies would not honestly expect to attempt this process.  The term Contract to Hire spells almost certain unemployment for IT works going down that path.  The “to Hire” portion is almost universally nothing more than a marketing ploy and a very dishonest one.

The other model that we must consider is the model of the contract-desiring employee accepting a Contract to Hire position.  In this model we have the better outcome for both parties.  The worker is happy with the contract arrangement and the company is able to employ someone who is happy to be there and not seeking something that they likely will be unable to get.  In cases where the company was less than forthcoming about the fact that the “to Hire” conversion would never be considered this might actually even work out well, but is fall less likely to do so long term and in repeating engagements than if both parties were up front and honest about their intentions on a regular basis.  Even for professional contracts seeing the “to Hire” addendum is a red flag that something is amiss.

The results for a company, however, when obtaining an intentional contractor via a Contract to Hire posting is risky.  For one contractors are highly volatile and are skilled and trained at finding other positions.  They are generally well prepared to leave a position the moment that the original contract is done.

One reason that the term Contract to Hire is used is so that companies can easily “string along” someone desiring a conversion to a full time position by dangling the conversion like a carrot and prolonging contact situations indefinitely.  Intentional contractors will see no carrot in this situation and will be, normally, prepared to leave immediately upon completion of their contract time and can leave without any notice as they simply need not renew their contract leaving the company in a lurch of their own making.

Even in scenarios where an intentional contractor is offered a conversion at the end of a contract period there is the very real possibility that they will simply turn down the conversion.  Just as the company maintains the right to not offer the conversion, the IT worker maintains an equal right to not agree to offered terms.  The conversion process is completely optional by both parties.  This, too, can leave the company in a tight position if they were banking on the assumption that all IT workers were highly desirous of permanent employment positions.

This may be the better situation, however.  Potentially even worse is an intentional contractor accepting a permanent employment position when they were not actually desiring an arrangement of that type.  They are likely to find the position to be something that they do not enjoy, or else they would have been seeking such an arrangement already, and will be easily tempted to leave for greener pastures very soon defeating the purpose of having hiring an employee to the company again.

The idea behind the Contract to Hire movement is the mistaken belief by companies that companies hold all of the cards and that IT workers are all desperate for work and thankful to find any job that they can.  This, combined with the incorrect assumption that nearly all IT workers truly want stable, traditional employment as a full time employee combines to make a very bad hiring situation.

Based on this, a great many companies attempt to leverage the Contract to Hire term in order to lure more and better IT workers to apply based on false promises or poor matching of employment values.  It is seen as a means of lowering cost, testing out potential employees, hedge bets against future head count needs, etc.

In a market where there is a massive over supply of IT workers a tactic such as this may actually pay off.  In the real world, however, IT workers are in very short supply and everyone is aware of the game that companies play and what this term truly means.

It might be assumed that IT workers would still consider taking Contract to Hire because they are willing to take on some risk and hope to convince the employer that conversion, in their case, would be worth while.  And certainly some companies do this process and for some people it has worked out well.  However, it should be noted, that any contract position offers the potential of a conversion offer and in positions where the to “Contract to Hire” is not used, conversions are actually quite common, or at least offers for conversion.  It is specifically when a potential future conversion is offered like a carrot that the conversions become exceptionally rare.  There is no need for an honest company and a quality workplace to mention “to Hire” when bringing on contractors.

What happens, however, is more complex and requires study.  In general the best workers in any field are those that are already employed.  It goes without saying that the better you are, the more likely you are to be employed.  This doesn’t mean that great people never change jobs or find themselves unemployed but the better you are the more time you will average not seeking employment from a position of being unemployed and the worse you are the more likely you are to be unemployed non-voluntarily.  That may seem obvious, but when you combine that with other information that we have, something is amiss.  A Contract to Hire position can never, effectively, entice currently working people in any way.  A great offer of true, full time employment with better pay and benefits might entire someone to give up an existing position for a better one, that happens every day.  But good people generally have good jobs and are not going to give up the positions that they have, the safety and stability to join an unknown situation that only offers a short term contract with an almost certain no chance conversion carrot.  It just is not going to happen.

Likewise when good IT workers are unemployed they are not very likely to be in a position of desperation and even then are very unlikely to even talk to a position listing as Contract to Hire (or contract at all) as most people want full time employment and good IT people will generally be far too busy turning down offers to waste time looking at Contract to Hire positions.  Good IT workers are flooded with employment opportunities and being able to quickly filter out those that are not serious is a necessity.  The words “Contract to Hire” are one of the best low hanging fruits of this filtering process.  You don’t need to see what company it is, what region it is in, what the position is or what experience they expect.  The position is not what you are looking for, move along, nothing to see here.

The idea that employers seem to have is the belief that everyone, employed and unemployed IT workers alike, are desperate and thankful for any possibly job opening.  This is completely flawed.  Most of the industry is doing very well and there is no way to fill all of the existing job openings that we have today, IT workers are in demand.  Certainly there is always a certain segment of the IT worker population that is desperate for work for one reason or another – personal situations, geographic ties, over staffed technology specialization or, most commonly, not being very competitive.

What Contract to Hire positions do is filter out the best people.  They effectively filter out every currently employed IT worker completely.  In demand skills groups (like Linux, storage, cloud and virtualization) will be sorted out too, they are too able to find work anywhere to consider poor offerings.  Highly skilled individuals, even when out of work, will self filter as they are looking for something good, not looking for just anything that comes along.

At the end of the day, the only people in any number seriously considering Contract to Hire positions, often even to the point of being the only ones even willing to respond to postings, are the truly desperate.  Only the group that either has so little experience that they do not realize how foolish the concept is or, more commonly by far, those that are long out of work and have few prospects and feel that the incredible risks and low quality of work associated with Contract to Hire is acceptable.

This hiring problem begins a vicious loop of low quality, if one did not already exist. But most likely issues with quality already will exist before a company considers a Contract to Hire tactic.  Once good people begin to avoid a company, and this will happen even if only some positions are Contract to Hire, – because the quality of the hiring process is exposed, the quality of those able to be hired will begin to decline.  The worse it gets, the harder to turn the ship around.  Good people attract good people.  Good IT workers want to work with great IT workers to mentor them, to train them and to provide places where they can advance by doing a good job.  Good people do not seek to work in a shop staffed by the desperate.  Both because working only with desperate people is depressing and the quality of work is very poor, but also because once a shop gains a poor reputation it is very hard to shake and good people will be very wary of having their own reputation tarnished by having worked in such a place.

Contact to Hire tactics signal desperation and a willingness to admit defeat on the part of an employer.  Once a company sinks to this level with their hiring they are no longer focusing on building great teams, acquiring amazing talent or providing a wonderful work environment.  Contract to Hire is not always something that every IT professional can avoid all of the time.  All of us have times when we have to accept something less than ideal.  But it is important for all parties involved to understand their options and just what it means when a company moves into this mode.  Contract to Hire is not a tactic for vetting potential hires, it simply does not work that way.  Contract to Hire causes companies to be vetted and filter out of consideration by the bulk of potential candidates without those metrics every being made available to hiring firms.  Potential candidates simply ignore them and write them off, sometimes noting who is hiring this way and avoiding them even when other options come along in the future.

As a company, if you desire to have a great IT department and hire good people, do not allow Contract to Hire to ever be associated with your firm.  Hire full time employees and hire intentional contractors, but do not play games with dangling false carrots hoping that contractos will change their personalities or that full time employees will take huge personal risks for no reason, that is simply not how the real world works.

Ferraris and Tractor Trailers

Working in the SMB world, it is actually pretty rare that we need to talk about latency.  The SMB world is almost universally focused on system throughput and generally unaware of latency as a need.  But there are times where latency becomes important and when it does it is critical that we understand the interplay of throughput and latency and just what “speed” means to us.  Once we start moving into the enterprise space, latency is more often going to be viewed as a concern, but even there throughput nearly always reigns supreme, to the point that concepts of speed almost universally revolve around throughput and concepts of latency are often ignored or forgotten.

Understanding the role of latency in a system can be complicated, even though latency itself is relatively simple to understand.

A great comparison between latency and throughput that I like to use is the idea of a Ferrari and a tractor trailer.  Ferraris are “fast” in the traditional sense, they have a high “miles per hour.”  One might say that they are designed for speed.  But are they?

We generally consider tractor trailers to be slow.  They are big and lumbering beasts that have a low top end speed.  But they haul a lot of stuff at once.

In computer terms we normally think of speed like hauling capacity – we think in terms of “items” per second.  In the terms of a Ferrari going two hundred miles per hour is great, but it can haul maybe one box at a time.  A tractor trailer can only go one hundred miles per hour but can haul closer to one thousand boxes at a time.  When we talk about throughput or speed on a computer this is more what we think about.  In network terms we think of gigabytes per second and are rarely concerned with the speed of an individual packet as a single packet is rarely important.  In computational terms we think about ideas like floating point operations per second, a similar concept.  No one really cares how long a single FLOP (floating point operation) takes, only how many we can get done in one or ten seconds.

So when looking at a Ferrari we could say that it has a useful speed of two hundred box-miles per hour.  That is for every hour of operations, a Ferrari can move one box up to two hundred miles.  A tractor trailer has a useful speed of one hundred thousand box-miles per hour.  In terms of moving packages around, the throughput of the tractor trailer is easily five hundred times “faster” than that of the Ferrari.

So in terms of how we normally think of computers and networks a tractor trailer would be “fast” and a Ferrari would be “slow.”

But there is also latency to consider.  Assuming that our payload is tiny, say a letter or a small box, a Ferrari can move that one box over a thousand miles in just five hours!  A tractor trailer would take ten hours to make this same journey (but could have a LOT of letters all arriving at once.)  If what we need is to get a message or a small parcel from one place to another very quickly the Ferrari is the better choice because it has half the latency (delay) from the time we initiate the delivery until the first package is delivered than the tractor trailer does.

As you can imagine, in most cases tractor trailers are vastly more practical because their delivery speed is so much higher.  And, this being the case, we actually see large trucks on the highways all of the time and the occurrence rate of Ferraris is very low – even though each cost about the same amount to purchase (very roughly.)  But in special cases, the Ferrari makes more sense.  Just not very often.

This is a general case concept and can apply to numerous applications.  It applies to caching systems, memory, CPU, networking, operating system kernels and schedulers, to cars and more.  Latency and throughput are generally inversely related – we give up latency in order to obtain throughput.  For most operations this makes the best sense.  But sometimes it makes more sense to tune for latency.

Storage is actually an odd duck in computing where nearly all focus on storage performance is around IOPS, which is roughly a proxy measurement for latency, instead of throughput which is measured in “data transferred per second.”  Rarely do we care about this second number as it is almost never the source of storage bottlenecks.  But this is the exception, not the rule.

Latency and throughput can have some surprising interactions in the computing world.  When we talk about networks, for example, we typically measure only throughput (Gb/s) but rarely care much about the latency (normally measured in milliseconds.)  Typically this is because nearly all networking systems have similar latency numbers and most applications are pretty much unconcerned with latency delays.  It is only the rare application like VoIP over International links or satellite where latency affects the average person or can sometimes surprise people when they attempt something uncommon like iSCSI over a long distance WAN connection and suddenly latency pops up to surprise them as an unforeseen problem.

One of the places where the interaction of latency and throughput starts to become shocking and interesting is when we move from electrical or optical data networks to physical ones.  A famous quote in the industry is:

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

This is a great demonstration of huge bandwidth with very high latency.  Driving fifty miles across town a single stationwagon or SUV could haul hundreds of petabytes of data hitting data rates that 10GB/s fiber could not come close to.  But the time for the first data packet to arrive is about an hour.  We often discount this kind of network because we assume that latency must be bounded at under about 500ms.  But that is not always the case.

Australia recently made the news where they did a test to see if a pigeon carrying an SD card could, in terms of network throughput, outperform the regions ISP – and the pigeon ended up being faster than the ISP!

In terms of computing performance we often ignore latency to the point of not even being aware of it as a context in which to discuss performance.  But in low latency computing circles it is considered very carefully.  System throughput is generally greatly reduced (it becomes common to target systems to only hit ten percent CPU utilization when more traditional systems target closer to ninety percent) with concepts like real time kernels, CPU affinity, processor pinning, cache hit ratios and lowered measuring all being used to focus on obtaining the most immediate response possible from a system rather than attempting to get the most total processing out of a system.

Common places where low latency from a computational perspective is desired is in critical controller systems (such as manufacturing controllers were even a millisecond of latency can cause problems on the factory floor) or in financial trading systems where a few milliseconds of delay can cause investments to have changed in price or products to have already been sold and no longer be available.  Speed, in terms of latency, is often the deciding factor between making money or losing money – even a single millisecond can be crippling.

Technically even audio and video processing systems have to be latency sensitive but most modern computing systems have so much spare processing overhead and latency is generally low enough that most systems, even VoIP PBXs and conferencing systems, can function today with only very rarely needing to be aware of latency concerns on the processing side (even networking latency is becoming less and less common as a concern.)  The average system administrator or engineer might easily manage to go through a career without ever needing to work on a system that is latency sensitive or for which there is not so much available overhead as to hide any latency sensitivity.

Defining speed, whether that means throughput, latency or even something else or some combination of the two, is something that is very important in all aspects of IT and in life.  Understanding how they affect us in different situations and how they react to each other with them generally existing in an indirect relationship where improvements in throughput come at a cost to latency or vice versa and learning to balance these as needed to improve the systems that we work on is very valuable.

Types of IT Service Providers

A big challenge, both to IT Service Providers and to their customers, is in attempting to define exactly what an IT vendor is and how their customers should expect to interact with them.  Many people see IT Service Providers (we will call them ITSPs for short here) as a single type of animal but, in reality, ITSPs come in all shapes and sizes and need to be understood in order to leverage a relationship with them well.  Even if we lack precise or universally accepted terms, the concepts are universal.

Even within the ITSP industry there is little to no standardization of naming conventions, even though there are relatively tried and true company structures which are nearly always followed.  The really important aspect of this discussion is not to carefully define the names of service providers but to explain the competing approaches so that, when engaging a service provider, a meaningful discussion around these models can be had so that an understanding of an appropriate resultant relationship can be achieved.

It is also important to note that any given service provider many use a hybrid or combination of models.  Using one model does not preclude the use of another as well.  In fact it is most common for a few approaches to be combined as multiple approaches makes it easier to capture revenue which is quite critical and IT service provisioning is a relatively low margin business.

Resellers and VARs:  The first, largest and most important to identify category is that of a reseller.  Resellers are the easiest to identify as they, as their name indicates, resell things.  Resellers vary from pure resellers, those companies that do nothing but purchase from vendors on one side and sell to customers on the other (vendors like NewEgg and Amazon would fit into this category while not focusing on IT products) to the more popular Value Added Resellers who not only resell products but maintain some degree of skill or knowledge around products.

Value Added Resellers are a key component of the overall IT vendor ecosystem as they supply more than just a product purchasing supply chain but maintain key skills around those products.  Commonly VARs will have skills around product integration, supply chain logistics, supported configurations, common support issues, licensing and other factors.  It is common for customers and even other types of ITSPs to lean on a VAR in order to get details about product specifics or insider information.

Resellers of any type, quite obviously, earn their money through markup and margins on the goods that they resell.  This creates for an interesting relationship between customers and the vendor as the vendor is always in a position of needing to make a sale in order to produce revenue.  Resellers are often turned to for advice, but it must be understood that the relationship is one of sales and the reseller only gets compensated when a sale takes place.  This makes the use of a reseller somewhat complicated as the advice or expertise sought may come at a conflict with what is in the interest of the reseller.  The relationship with a reseller requires careful management to ensure that guidance and direction coming from the reseller is aligned with the customer’s needs and is isolated to areas in which the reseller is an expert and in ways that is mutually beneficial to both parties.

Managed Service Providers or MSPs: The MSP has probably the most well known title in this field.  In recent years the term MSP has come to be used so often that it is often simply used to denote any IT service provider, whether or not they provide something that would appropriately be deemed to be a “managed service.”  To understand what an MSP is truly meant to be we have to understand what a “managed service” is meant to be in the context of IT.

The idea of managed services is generally understood to be related to the concept of “packaging” a service.  That is producing a carefully designed and designated service or set of services that can be sold with a fixed or relatively predictable price.  MSPs typically have very well defined service offerings and can often provide very predictable pricing.  MSPs take the time up front to develop predictable service offerings allowing customers to be able to plan and budget easily.

This heavy service definition process generally means that selecting an MSP is normally done very tightly around specific products or processes and nearly always requires customers to conform to the MSPs standards.  In exchange, MSPs can provide very low cost and predictable pricing in many cases.  Some of the most famous approaches from MSPs include the concepts of “price per desktop”, “price per user” or “price per server” packages where a customer might pay one hundred dollars per desktop per month and work from a fixed price for whatever they need.  The MSP, in turn, may define what desktops will be used, what operating system is used and what software may be run on top of it.  MSPs almost universally have a software package or a set of standard software packages that are used to manage their customers.   MSPs generally rely on scaling across many customers with shared processes and procedures in order to create a cost effective structure.

MSPs typically focus on internal efficiencies to maximize profits.  The idea being that a set price service offering can be made to be more and more effective by adding more nearly identical customers and improving processes and tooling in order to reduce the cost of delivering the service.  This can be a great model with a high degree of alignment between the needs of the vendor and the customer as both benefit from an improvement in service delivery and the MSP is encouraged to undertake the investments to improve operational efficiency in order to improve profits.  The customer benefits from set pricing and improved services while the vendor benefits from improved margins.  The caveat here is that there is a risk that the MSP will seek to skirt responsibilities or to lean towards slow response or corner cutting since the prices are fixed and only the services are flexible.

IT Outsourcers & Consultants: IT Outsourcing may seen like the most obvious form of ITSP but it is actually a rather uncommon approach.  I lump together the ideas of IT Outsourcing and consulting because, in general, they are actually the same thing but simply handled at two different scales.  The behaviours are essentially the same between them.  In contrast with MSPs, we could also think of this group as Unmanaged Service Providers.  IT Outsourcers do not develop heavily defined service packages but instead rely on flexibility and a behaviour much more akin to that of an internal IT department.  IT Outsourcers literally act like an external IT department or portion thereof.  An IT Outsourcer will typically have a technological specialty or a range of specialties but many are also very generalized and will handle nearly any technological need.

This category can act in a number of different ways when interacting with a business.  When brought in for a small project or a single technological issue they are normally thought of as a consultancy – providing expertise and advice around a single issue or set of issues.  Outsourcing can also mean using the provider as a replacement for the entire IT department allowing a company to exist without any IT staff of their own.  And there is a lot of middle ground where the IT Outsourcer might be brought in only to handle specific roles within the larger IT organization such as only running and manning the help desk, only doing network engineering or providing continuous management and oversight but not doing hands on technical work.  IT Outsources are very hard to define because they are so flexible and can exist in so many different ways.  Each IT Outsourcer is unique as is, in most cases, every client engagement.

IT Outsourcing is far more common, and almost ubiquitous, within the large business and enterprise spaces.  It is a very rare enterprise that does not turn to outsourcing for at least some role within the organization.  Small businesses use IT Outsourcers heavily but are more likely to use the more well defined MSP model than their larger counterparts.  The MSP market is focused primarily on the small and medium business space.

It is imperative, of course, that the concept of outsourcing not be conflated with off-shoring which is the practice of sending IT jobs overseas.  These two things are completely unrelated.  Outsourcing often means sending work to a company down the street or at least in the same country or region.  Off-shoring means going to a distant country, presumably across the ocean.  It is off-shoring that has the bad reputation but sadly people often use the term outsourcing to incorrectly refer to it which leads to much confusion.  Many companies use internal staff in foreign markets to off-shore while being able to say that no jobs are outsourced.  The misuse of this term has made it easy for companies to hide off-shoring of labor and given the local use of outsourced experts a bad reputation without cause.

It is common for IT Outsourcing relationships to be based around a cost per hour or per “man day” or on something akin to a time and materials relationship.  These arrangements come in all shapes and sizes, to be sure, but generally the alignment of an IT Outsourcer to a business is the most like the relationship that a business has with its own internal IT department.  Unlike MSPs who generally have a contractual leaning towards pushing for efficiency and cutting corners to add to profits, Outsourcers have a contractual leaning towards doing more work and having more billable hours.  Understanding how each organization makes its money and where it is likely to “pad” or where cost is likely to creep is critical in managing the relationships.

Professional Services: Professional Services firms overlap heavily with the more focused consulting role within IT Outsourcing and this makes both of these roles rather hard to define.  Professional Services tend to be much more focused, however, on very specific markets whether horizontal, vertical or both.  Professional Services firms generally do not offer full IT department or fully flexible arrangements like the IT Outsourcer does but are not packaged services like the MSP model.  Typically a Professional Services firm might be centered around a small group of products that compete for a specific internal function and invest heavily in the expertise around those functions.  Professional Services tend to be brought in more on a project basis than Outsourcers who, in turn are more likely to be project based than MSPs.

Professional Services firms tend to bill based on project scope.  This means that the relationship with a PS firm requires careful scope management.  Many IT Outsourcers will do project based work as well and when billing in this way this would apply equally to them and some PS firms will be billing by the hour and so the IT Outsourcing relationship would apply.  In a project it is important that everyone be acutely aware of the scope and how it is defined.  A large amount of overhead must go into the scoping by both sides as it is the scope document that will define the ability for profits and cost.  PS firms are by necessity experts at ensuring that scopes are well defined and profitable to them.  It is very easy for a naive IT department to improperly scope a project and be left with a project that they feel is incomplete.  If scope management is, and you will excuse the pun, out of scope for your organization then it is wise to pursue Professional Services arrangements via a more flexible term such as hourly or time and materials.

All of these types of firms have an important role to play in the IT ecosystem.  Rarely can an internal IT department have all of the skills necessary to handle every situation on their own, it requires the careful selection and management of outside firms to help to round out the needs of a business to cover what is needed in the best ways possible.  At a minimum, internal IT must work with vendors and resellers to acquire the gear that they need for IT to exist.  Rarely does it stop there.  Whether an IT department needs advice on a project, extra hands when things get busy, oversight on something that has not been done before, support during holidays or off hours or just peers off of whom ideas can be bounced, IT departments of all sizes and types turn to IT Service Providers to fill in gaps both big and small.

Any IT role or function can be moved from internal to external staff.  The only role that ultimately can never be moved to an external team is the top level of vendor management.  At some point, someone internal to the business in question must oversee the relationship with at least one vendor (or else there must be a full internal IT staff fulfilling all roles.)  In many modern companies it may make sense for a single internal person to the company, often a highly trusted senior manager, be assigned to oversee vendor relationships but allow a vendor or a group of vendors to actually handle all aspects of IT.  Some vendors specialize in vendor relationship management and may bring experience with and quality management of other vendors with them as part of their skill set.  Often these are MSPs or IT Outsources who are bringing IT Management as part of their core skill set.  This can be a very valuable component as often these vendors work with other vendors a great deal and have a better understanding of performance expectations, cost expectations and leverage more scale and reputation than the end customer will.

Just as an internal IT department is filled with variety, so are IT service and product vendors.  Your vendor and support ecosystem is likely to be large and unique and will play a significant role in defining how you function as an IT department.  The key to working well with this ecosystem is understanding what kind of organization it is that you are working with, considering their needs and motivations and working to establish relationships based on mutual business respect coupled with relational guidelines that promote mutual success.

Remember that as the customer you drive the relationship with the vendor; they are stuck in the position of delivering the service requested or declining to do so.  But as the customer, you are in a position to push for a good working relationship that makes everyone able to work together in a healthy way.  Not every relationship is going to work out for the best, but there are ways to encourage good outcomes and to put the best foot forward in starting a new relationship.