Ferraris and Tractor Trailers

Working in the SMB world, it is actually pretty rare that we need to talk about latency.  The SMB world is almost universally focused on system throughput and generally unaware of latency as a need.  But there are times where latency becomes important and when it does it is critical that we understand the interplay of throughput and latency and just what “speed” means to us.  Once we start moving into the enterprise space, latency is more often going to be viewed as a concern, but even there throughput nearly always reigns supreme, to the point that concepts of speed almost universally revolve around throughput and concepts of latency are often ignored or forgotten.

Understanding the role of latency in a system can be complicated, even though latency itself is relatively simple to understand.

A great comparison between latency and throughput that I like to use is the idea of a Ferrari and a tractor trailer.  Ferraris are “fast” in the traditional sense, they have a high “miles per hour.”  One might say that they are designed for speed.  But are they?

We generally consider tractor trailers to be slow.  They are big and lumbering beasts that have a low top end speed.  But they haul a lot of stuff at once.

In computer terms we normally think of speed like hauling capacity – we think in terms of “items” per second.  In the terms of a Ferrari going two hundred miles per hour is great, but it can haul maybe one box at a time.  A tractor trailer can only go one hundred miles per hour but can haul closer to one thousand boxes at a time.  When we talk about throughput or speed on a computer this is more what we think about.  In network terms we think of gigabytes per second and are rarely concerned with the speed of an individual packet as a single packet is rarely important.  In computational terms we think about ideas like floating point operations per second, a similar concept.  No one really cares how long a single FLOP (floating point operation) takes, only how many we can get done in one or ten seconds.

So when looking at a Ferrari we could say that it has a useful speed of two hundred box-miles per hour.  That is for every hour of operations, a Ferrari can move one box up to two hundred miles.  A tractor trailer has a useful speed of one hundred thousand box-miles per hour.  In terms of moving packages around, the throughput of the tractor trailer is easily five hundred times “faster” than that of the Ferrari.

So in terms of how we normally think of computers and networks a tractor trailer would be “fast” and a Ferrari would be “slow.”

But there is also latency to consider.  Assuming that our payload is tiny, say a letter or a small box, a Ferrari can move that one box over a thousand miles in just five hours!  A tractor trailer would take ten hours to make this same journey (but could have a LOT of letters all arriving at once.)  If what we need is to get a message or a small parcel from one place to another very quickly the Ferrari is the better choice because it has half the latency (delay) from the time we initiate the delivery until the first package is delivered than the tractor trailer does.

As you can imagine, in most cases tractor trailers are vastly more practical because their delivery speed is so much higher.  And, this being the case, we actually see large trucks on the highways all of the time and the occurrence rate of Ferraris is very low – even though each cost about the same amount to purchase (very roughly.)  But in special cases, the Ferrari makes more sense.  Just not very often.

This is a general case concept and can apply to numerous applications.  It applies to caching systems, memory, CPU, networking, operating system kernels and schedulers, to cars and more.  Latency and throughput are generally inversely related – we give up latency in order to obtain throughput.  For most operations this makes the best sense.  But sometimes it makes more sense to tune for latency.

Storage is actually an odd duck in computing where nearly all focus on storage performance is around IOPS, which is roughly a proxy measurement for latency, instead of throughput which is measured in “data transferred per second.”  Rarely do we care about this second number as it is almost never the source of storage bottlenecks.  But this is the exception, not the rule.

Latency and throughput can have some surprising interactions in the computing world.  When we talk about networks, for example, we typically measure only throughput (Gb/s) but rarely care much about the latency (normally measured in milliseconds.)  Typically this is because nearly all networking systems have similar latency numbers and most applications are pretty much unconcerned with latency delays.  It is only the rare application like VoIP over International links or satellite where latency affects the average person or can sometimes surprise people when they attempt something uncommon like iSCSI over a long distance WAN connection and suddenly latency pops up to surprise them as an unforeseen problem.

One of the places where the interaction of latency and throughput starts to become shocking and interesting is when we move from electrical or optical data networks to physical ones.  A famous quote in the industry is:

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

This is a great demonstration of huge bandwidth with very high latency.  Driving fifty miles across town a single stationwagon or SUV could haul hundreds of petabytes of data hitting data rates that 10GB/s fiber could not come close to.  But the time for the first data packet to arrive is about an hour.  We often discount this kind of network because we assume that latency must be bounded at under about 500ms.  But that is not always the case.

Australia recently made the news where they did a test to see if a pigeon carrying an SD card could, in terms of network throughput, outperform the regions ISP – and the pigeon ended up being faster than the ISP!

In terms of computing performance we often ignore latency to the point of not even being aware of it as a context in which to discuss performance.  But in low latency computing circles it is considered very carefully.  System throughput is generally greatly reduced (it becomes common to target systems to only hit ten percent CPU utilization when more traditional systems target closer to ninety percent) with concepts like real time kernels, CPU affinity, processor pinning, cache hit ratios and lowered measuring all being used to focus on obtaining the most immediate response possible from a system rather than attempting to get the most total processing out of a system.

Common places where low latency from a computational perspective is desired is in critical controller systems (such as manufacturing controllers were even a millisecond of latency can cause problems on the factory floor) or in financial trading systems where a few milliseconds of delay can cause investments to have changed in price or products to have already been sold and no longer be available.  Speed, in terms of latency, is often the deciding factor between making money or losing money – even a single millisecond can be crippling.

Technically even audio and video processing systems have to be latency sensitive but most modern computing systems have so much spare processing overhead and latency is generally low enough that most systems, even VoIP PBXs and conferencing systems, can function today with only very rarely needing to be aware of latency concerns on the processing side (even networking latency is becoming less and less common as a concern.)  The average system administrator or engineer might easily manage to go through a career without ever needing to work on a system that is latency sensitive or for which there is not so much available overhead as to hide any latency sensitivity.

Defining speed, whether that means throughput, latency or even something else or some combination of the two, is something that is very important in all aspects of IT and in life.  Understanding how they affect us in different situations and how they react to each other with them generally existing in an indirect relationship where improvements in throughput come at a cost to latency or vice versa and learning to balance these as needed to improve the systems that we work on is very valuable.

Types of IT Service Providers

A big challenge, both to IT Service Providers and to their customers, is in attempting to define exactly what an IT vendor is and how their customers should expect to interact with them.  Many people see IT Service Providers (we will call them ITSPs for short here) as a single type of animal but, in reality, ITSPs come in all shapes and sizes and need to be understood in order to leverage a relationship with them well.  Even if we lack precise or universally accepted terms, the concepts are universal.

Even within the ITSP industry there is little to no standardization of naming conventions, even though there are relatively tried and true company structures which are nearly always followed.  The really important aspect of this discussion is not to carefully define the names of service providers but to explain the competing approaches so that, when engaging a service provider, a meaningful discussion around these models can be had so that an understanding of an appropriate resultant relationship can be achieved.

It is also important to note that any given service provider many use a hybrid or combination of models.  Using one model does not preclude the use of another as well.  In fact it is most common for a few approaches to be combined as multiple approaches makes it easier to capture revenue which is quite critical and IT service provisioning is a relatively low margin business.

Resellers and VARs:  The first, largest and most important to identify category is that of a reseller.  Resellers are the easiest to identify as they, as their name indicates, resell things.  Resellers vary from pure resellers, those companies that do nothing but purchase from vendors on one side and sell to customers on the other (vendors like NewEgg and Amazon would fit into this category while not focusing on IT products) to the more popular Value Added Resellers who not only resell products but maintain some degree of skill or knowledge around products.

Value Added Resellers are a key component of the overall IT vendor ecosystem as they supply more than just a product purchasing supply chain but maintain key skills around those products.  Commonly VARs will have skills around product integration, supply chain logistics, supported configurations, common support issues, licensing and other factors.  It is common for customers and even other types of ITSPs to lean on a VAR in order to get details about product specifics or insider information.

Resellers of any type, quite obviously, earn their money through markup and margins on the goods that they resell.  This creates for an interesting relationship between customers and the vendor as the vendor is always in a position of needing to make a sale in order to produce revenue.  Resellers are often turned to for advice, but it must be understood that the relationship is one of sales and the reseller only gets compensated when a sale takes place.  This makes the use of a reseller somewhat complicated as the advice or expertise sought may come at a conflict with what is in the interest of the reseller.  The relationship with a reseller requires careful management to ensure that guidance and direction coming from the reseller is aligned with the customer’s needs and is isolated to areas in which the reseller is an expert and in ways that is mutually beneficial to both parties.

Managed Service Providers or MSPs: The MSP has probably the most well known title in this field.  In recent years the term MSP has come to be used so often that it is often simply used to denote any IT service provider, whether or not they provide something that would appropriately be deemed to be a “managed service.”  To understand what an MSP is truly meant to be we have to understand what a “managed service” is meant to be in the context of IT.

The idea of managed services is generally understood to be related to the concept of “packaging” a service.  That is producing a carefully designed and designated service or set of services that can be sold with a fixed or relatively predictable price.  MSPs typically have very well defined service offerings and can often provide very predictable pricing.  MSPs take the time up front to develop predictable service offerings allowing customers to be able to plan and budget easily.

This heavy service definition process generally means that selecting an MSP is normally done very tightly around specific products or processes and nearly always requires customers to conform to the MSPs standards.  In exchange, MSPs can provide very low cost and predictable pricing in many cases.  Some of the most famous approaches from MSPs include the concepts of “price per desktop”, “price per user” or “price per server” packages where a customer might pay one hundred dollars per desktop per month and work from a fixed price for whatever they need.  The MSP, in turn, may define what desktops will be used, what operating system is used and what software may be run on top of it.  MSPs almost universally have a software package or a set of standard software packages that are used to manage their customers.   MSPs generally rely on scaling across many customers with shared processes and procedures in order to create a cost effective structure.

MSPs typically focus on internal efficiencies to maximize profits.  The idea being that a set price service offering can be made to be more and more effective by adding more nearly identical customers and improving processes and tooling in order to reduce the cost of delivering the service.  This can be a great model with a high degree of alignment between the needs of the vendor and the customer as both benefit from an improvement in service delivery and the MSP is encouraged to undertake the investments to improve operational efficiency in order to improve profits.  The customer benefits from set pricing and improved services while the vendor benefits from improved margins.  The caveat here is that there is a risk that the MSP will seek to skirt responsibilities or to lean towards slow response or corner cutting since the prices are fixed and only the services are flexible.

IT Outsourcers & Consultants: IT Outsourcing may seen like the most obvious form of ITSP but it is actually a rather uncommon approach.  I lump together the ideas of IT Outsourcing and consulting because, in general, they are actually the same thing but simply handled at two different scales.  The behaviours are essentially the same between them.  In contrast with MSPs, we could also think of this group as Unmanaged Service Providers.  IT Outsourcers do not develop heavily defined service packages but instead rely on flexibility and a behaviour much more akin to that of an internal IT department.  IT Outsourcers literally act like an external IT department or portion thereof.  An IT Outsourcer will typically have a technological specialty or a range of specialties but many are also very generalized and will handle nearly any technological need.

This category can act in a number of different ways when interacting with a business.  When brought in for a small project or a single technological issue they are normally thought of as a consultancy – providing expertise and advice around a single issue or set of issues.  Outsourcing can also mean using the provider as a replacement for the entire IT department allowing a company to exist without any IT staff of their own.  And there is a lot of middle ground where the IT Outsourcer might be brought in only to handle specific roles within the larger IT organization such as only running and manning the help desk, only doing network engineering or providing continuous management and oversight but not doing hands on technical work.  IT Outsources are very hard to define because they are so flexible and can exist in so many different ways.  Each IT Outsourcer is unique as is, in most cases, every client engagement.

IT Outsourcing is far more common, and almost ubiquitous, within the large business and enterprise spaces.  It is a very rare enterprise that does not turn to outsourcing for at least some role within the organization.  Small businesses use IT Outsourcers heavily but are more likely to use the more well defined MSP model than their larger counterparts.  The MSP market is focused primarily on the small and medium business space.

It is imperative, of course, that the concept of outsourcing not be conflated with off-shoring which is the practice of sending IT jobs overseas.  These two things are completely unrelated.  Outsourcing often means sending work to a company down the street or at least in the same country or region.  Off-shoring means going to a distant country, presumably across the ocean.  It is off-shoring that has the bad reputation but sadly people often use the term outsourcing to incorrectly refer to it which leads to much confusion.  Many companies use internal staff in foreign markets to off-shore while being able to say that no jobs are outsourced.  The misuse of this term has made it easy for companies to hide off-shoring of labor and given the local use of outsourced experts a bad reputation without cause.

It is common for IT Outsourcing relationships to be based around a cost per hour or per “man day” or on something akin to a time and materials relationship.  These arrangements come in all shapes and sizes, to be sure, but generally the alignment of an IT Outsourcer to a business is the most like the relationship that a business has with its own internal IT department.  Unlike MSPs who generally have a contractual leaning towards pushing for efficiency and cutting corners to add to profits, Outsourcers have a contractual leaning towards doing more work and having more billable hours.  Understanding how each organization makes its money and where it is likely to “pad” or where cost is likely to creep is critical in managing the relationships.

Professional Services: Professional Services firms overlap heavily with the more focused consulting role within IT Outsourcing and this makes both of these roles rather hard to define.  Professional Services tend to be much more focused, however, on very specific markets whether horizontal, vertical or both.  Professional Services firms generally do not offer full IT department or fully flexible arrangements like the IT Outsourcer does but are not packaged services like the MSP model.  Typically a Professional Services firm might be centered around a small group of products that compete for a specific internal function and invest heavily in the expertise around those functions.  Professional Services tend to be brought in more on a project basis than Outsourcers who, in turn are more likely to be project based than MSPs.

Professional Services firms tend to bill based on project scope.  This means that the relationship with a PS firm requires careful scope management.  Many IT Outsourcers will do project based work as well and when billing in this way this would apply equally to them and some PS firms will be billing by the hour and so the IT Outsourcing relationship would apply.  In a project it is important that everyone be acutely aware of the scope and how it is defined.  A large amount of overhead must go into the scoping by both sides as it is the scope document that will define the ability for profits and cost.  PS firms are by necessity experts at ensuring that scopes are well defined and profitable to them.  It is very easy for a naive IT department to improperly scope a project and be left with a project that they feel is incomplete.  If scope management is, and you will excuse the pun, out of scope for your organization then it is wise to pursue Professional Services arrangements via a more flexible term such as hourly or time and materials.

All of these types of firms have an important role to play in the IT ecosystem.  Rarely can an internal IT department have all of the skills necessary to handle every situation on their own, it requires the careful selection and management of outside firms to help to round out the needs of a business to cover what is needed in the best ways possible.  At a minimum, internal IT must work with vendors and resellers to acquire the gear that they need for IT to exist.  Rarely does it stop there.  Whether an IT department needs advice on a project, extra hands when things get busy, oversight on something that has not been done before, support during holidays or off hours or just peers off of whom ideas can be bounced, IT departments of all sizes and types turn to IT Service Providers to fill in gaps both big and small.

Any IT role or function can be moved from internal to external staff.  The only role that ultimately can never be moved to an external team is the top level of vendor management.  At some point, someone internal to the business in question must oversee the relationship with at least one vendor (or else there must be a full internal IT staff fulfilling all roles.)  In many modern companies it may make sense for a single internal person to the company, often a highly trusted senior manager, be assigned to oversee vendor relationships but allow a vendor or a group of vendors to actually handle all aspects of IT.  Some vendors specialize in vendor relationship management and may bring experience with and quality management of other vendors with them as part of their skill set.  Often these are MSPs or IT Outsources who are bringing IT Management as part of their core skill set.  This can be a very valuable component as often these vendors work with other vendors a great deal and have a better understanding of performance expectations, cost expectations and leverage more scale and reputation than the end customer will.

Just as an internal IT department is filled with variety, so are IT service and product vendors.  Your vendor and support ecosystem is likely to be large and unique and will play a significant role in defining how you function as an IT department.  The key to working well with this ecosystem is understanding what kind of organization it is that you are working with, considering their needs and motivations and working to establish relationships based on mutual business respect coupled with relational guidelines that promote mutual success.

Remember that as the customer you drive the relationship with the vendor; they are stuck in the position of delivering the service requested or declining to do so.  But as the customer, you are in a position to push for a good working relationship that makes everyone able to work together in a healthy way.  Not every relationship is going to work out for the best, but there are ways to encourage good outcomes and to put the best foot forward in starting a new relationship.

Avoiding Local Service Providers

Inflammatory article titles aside, the idea of choosing a technology service provider based on the fact or partially based on the fact that they are in some way located geographically near to where you are currently, is almost always a very bad idea.  Knowledge based services are difficult enough to find at all, let alone finding the best potential skills, experience and price while introducing artificial and unnecessary constraints to limit the field of potential candidates.

With the rare exception of major global market cities like New York City and London, it is nearly impossible to find a full range of skills in Information Technology in a single locality, at least not in conjunction with a great degree of experience and breadth.  This is true of nearly all highly technical industries – expertise tends to focus around a handful of localities around the world and the remaining skills are scattered in a rather unpredictable manner often because those people in the highest demand can command salary and locations as desired and live where they want to, not where they have to.

IT, more than nearly any other field, has little value in being geographically near to the business that it is supporting.  Enterprise IT departments, even when located locally to their associated businesses and working in an office on premises are often kept isolated in different buildings away from both the businesses that they are supporting and the physical systems on which they work.  It is actually very rare that enterprise server admins would physically ever see their servers or network admins see their switches and routers.  This becomes even less likely when we start talking about roles like database administrators, software developers and others who have even less association with devices that have any physical component.

Adding in a local limitation when looking for consulting talent (and in many cases even internal IT staff) adds an artificial constraint that eliminates nearly the entire possible field of talented people while encouraging people to work on site even for work for which it makes no sense.  Often working on site causes a large increase in cost and loss of productivity due to interruptions, lack of resources, poor work environment, travel or similar.  Working with exclusively or predominantly remote resources encourages a healthy investment in efficient working conditions that generally pay off very well.  But it is important to keep in mind that just because a service company is remote does not imply that the work that they will do will be remote.  In many cases this will make sense, but in others it will not.

Location agnostic workers have many advantages.  By not being tied to a specific location you get far more flexibility as to skill level (allowing you to pursue the absolute best people) or cost (by allowing you to hire people living in low cost areas) or simply offering flexibility as an incentive or get broader skill sets, larger staff, etc.  Choosing purely local services simply limits you in many ways.

Companies that are not based locally are not necessarily unable to provide local resources.  Many companies work with local resources, either local companies or individuals, to allow them to have a local presence.  In many cases this is simply what we call local “hands” and is analogous to how most enterprises work internally with centrally or remotely based IT staff and physical “hands” existing only at locations with physical equipment to be serviced.  In cases where specific expertise needs to be located with physical equipment or people it is common for companies to either staff locally in cases where the resource is needed on a very regular basis or to have specific resources travel to the location when needed.  These techniques are generally far more effective than attempting to hire firms with the needed staff already coincidentally located in the best location.  This can easily be more cost effective than working with a full staff that is already local.

As time marches forward needs change as well.  Companies that work local only can find themselves facing new challenges when they expand to include other regions or locations.  Do they choose vendors and partners only where they were originally located?  Or where they are moving to or expanding to?  Do they choose local for each location separately?  The idea of working with local resources only is nearly exclusive to the smallest of business.  Typically as businesses grow the concept of local begins to change in interesting ways.

Locality and jurisdiction may represent different things.  In many cases it may be necessary to work with businesses located in the same state or country as your business due to legal or financial logistical reasoning and this can often make sense.  Small companies especially may not be prepared the tackle the complexities of working with a foreign firm.  Larger companies may find these boundaries to be worthy of ignoring as well.  But the idea that location should be ignored should not be taken to mean that jurisdiction, by extension, should also be ignored.  Jurisdiction still plays a significant role – one that some IT service providers or other vendors may be able to navigate on your behalf allowing you to focus on working with a vendor within your jurisdiction while getting the benefits of support from another jurisdiction.

As with many artificial constraint situations, not only do we generally eliminate the most ideal vendor candidates, but we also risk “informing” the existing vendor candidate pool that we care more about locality than quality of service or other important factors.  This can lead to a situation where the vendor, especially in a smaller market, feels that they have a lock in to you as the customer and do not need to perform up to a market standard level, price competitively (as there is no true competition given the constraints) or worse.  A vendor who feels that they have a trapped customer is unlikely to perform as a good vendor long term.

Of course we don’t want to avoid companies simply because they are local to our own businesses, but we should not be giving undue preference to companies for this reason either.  Some work has advantages to being done in person, there is no denying this.  But we must be careful not to extend this to rules and needs that do not have this advantage nor should we confuse the location of a vendor with the location(s) where they do or are willing to do business.

In extreme cases, all IT work can, in theory, be done completely remotely and only bench work (the physical remote hands) aspects of IT need an on premises presence.  This is extreme and of course there are reasons to have IT on site.  Working with a vendor to determine how best service can be provided, whether locally, remotely or a combination of the two can be very beneficial.

In a broader context, the most important concept here is to avoid adding artificial or unnecessary constraints to the vendor selection process.  Assuming that a local vendor will be able or willing to deliver a value that a non-local vendor can or will do is just one way that we might bring assumption or prejudice to a process such as this.  There is every possibility that the local company will do the best possible job and be the best, most viable vendor long term – but the chances are far higher than you will find the right partner for your business elsewhere.  It’s a big world and in IT more than nearly any other field it is becoming a large, flat playing field.

The Jurassic Park Effect

“If I may… Um, I’ll tell you the problem with the scientific power that you’re using here, it didn’t require any discipline to attain it. You read what others had done and you took the next step. You didn’t earn the knowledge for yourselves, so you don’t take any responsibility for it. You stood on the shoulders of geniuses to accomplish something as fast as you could, and before you even knew what you had, you patented it, and packaged it, and slapped it on a plastic lunchbox, and now …” – Dr. Ian Malcolm, Jurassic Park

When looking at building a storage server or NAS, there is a common feeling that what is needed is a “NAS operating system.”  This is an odd reaction, I find, since the term NAS means nothing more than a “fileserver with a dedicated storage interface.”  Or, in other words, just a file server with limited exposed functionality.  The reason that we choose physical NAS appliances is for the integrated support and sometimes for special, proprietary functionality (NetApp being a key example of this offering extensive SMB and NFS integration and some really unique RAID and filesystem options or Exablox offering fully managed scale out file storage and RAIN style protection.)  Using a NAS to replace a traditional file server is, for the most part, a fairly recent phenomenon and one that I have found is often driven by misconception or the impression that managing a file server, one of the  most basic IT workloads, is special or hard.  File servers are generally considered the most basic form of server and traditionally what people meant when using the term server unless additional description was added and the only form commonly integrated into the desktop (every Mac, Windows and Linux desktop can function as a file server and it is very common to do so.)

There is, of course, nothing wrong with turning to a NAS instead of a traditional file server to meet your storage needs, especially as some modern NAS options, like Exablox, offer scale out and storage options that are not available in most operating systems.  But it appears that the trend to use a NAS instead of a file server has led to some odd behaviour when IT professionals turn back to considering file servers again.  A cascading effect, I suspect, where the reasons for why NAS are sometimes preferred and the goal level thinking are lost and the resulting idea of “I should have a NAS” remains, so that when returning to look at file server options there is a drive to “have a NAS” regardless of whether there is a logical reason for feeling that this is necessary or not.

First we must consider that the general concept of a NAS is a simple one, take a traditional file server, simplify it by removing options and package it with all of the necessary hardware to make a simplified appliance with all of the support included from the interface down to the spinning drives and everything in between.  Storage can be tricky when users need to determine RAID levels, drive types, monitor effectively, etc.  A NAS addresses this by integrating the hardware into the platform.  This makes things simple but can add risk as you have fewer support options and less ability to fix or replace things yourself.  A move from a file server to a NAS appliance is truly about support almost exclusively and is generally a very strong commitment to a singular vendor.  You chose the NAS approach because you want to rely on a vendor for everything.

When we move to a file server we go in the opposite direction.  A file server is a traditional enterprise server like any other.  You buy your server hardware from one vendor (HP, Dell, IBM, etc.) and your operating system from another (Microsoft, Red Hat, Suse, etc.)  You specify the parts and the configuration that you need and you have the most common computing model for all of IT.  With this model you generally are using standard, commodity parts allowing you to easily migrate between hardware vendors and between software vendors. You have “vendor redundancy” options and generally everything is done using open, standard protocols.  You get great flexibility and can manage and monitor your file server just like any other member of your server fleet, including keeping it completely virtualized.  You give up the vertical integration of the NAS in exchange for horizontal flexibility and standardization.

What is odd, therefore, is when returning to the commodity model but seeking, what is colloquially known as, a NAS OS.  Common examples of these include NAS4Free, FreeNAS and OpenFiler.  This category of products is generally nothing more than a standard operating system (often FreeBSD as it has ideal licensing, or Linux because it is well known) with a “storage interface” put onto it and no special or additional functionality that would not exist with the normal operating system.  In theory they are a “single function” operating system that does only one thing.  But this is not reality.  They are general purpose operating systems with an extra GUI management layer added on top.  One could say the same thing about most physical NAS products themselves, but they typically include custom engineering even at the storage level, special features and, most importantly, an integrated support stack and true isolation of the “generalness” of the underlying OS.  A “NAS OS” is not a simpler version of a general purpose OS, it is a more complex, yet less functional version of it.

What is additionally odd is that general OSes, with rare exception, already come with very simple, extremely well known and fully supported storage interfaces.  Nearly every variety of Windows or Linux servers, for example, have included simple graphical interfaces for these functions for a very long time.  These included GUIs are often shunned by system administrators as being too “heavy and unnecessary” for a simple file server.  So it is even more unusual that adding a third party GUI, one that is not patched and tested by the OS team and not standardly known and supported, would then be desired as this goes against the common ideals and practices of using a server.

And this is where the Jurassic Park effect comes in – the OS vendors (Red Hat, Microsoft, Oracle, FreeBSD, Suse, Canonical, et. al.) are giants with amazing engineering teams, code review, testing, oversight and enterprise support ecosystems.  While the “NAS OS” vendors are generally very small companies, some with just one part time person, who stand on the shoulders of these giants and build something that they knew that they could but they never stopped to ask if they should.  The resulting products are wholly negative compared to their pure OS counterparts, they do not make systems management easier nor do they fill a gap in the market’s service offerings. Solid, reliable, easy to use storage is already available, more vendors are not needed to fill this place in the market.

The logic often applied to looking at a NAS OS is that they are “easy to set up.”   This may or may not be true as easy, here, must be a relational term.  For there to be any value a NAS OS has to be easy in comparison to the standard version of the same operating system.  So in the case of FreeNAS, this would mean FreeBSD.  FreeNAS would need to be appreciably easier to set up than FreeBSD for the same, dedicated functions.  And this is easily true, setting up a NAS OS is generally pretty easy.  But this ease is only a panacea and one of which IT professionals need to be quite aware.  Making something easy to set up is not a priority in IT, making something that is easy to operate and repair when there are problems is what is important.  Easy to set up is nice, but if it comes at a cost of not understanding how the system is configured and makes operational repairs more difficult it is a very, very bad thing.  NAS OS products routinely make it dangerously easy to get a product into production for a storage role, which is almost always the most critical or nearly the most critical role of any server in an environment, that IT has no experience or likely skill to maintain, operate or, most importantly, fix when something goes wrong.  We need exactly the opposite, a system that is easy to operate and fix.  That is what matters.  So we have a second case of “standing on the shoulders of giants” and building a system that we knew we could, but did not know if we should.

What exacerbates this problem is that the very people who feel the need to turn to a NAS OS to “make storage easy” are, by the very nature of the NAS OS, the exact people for whom operational support and the repair of the system is most difficult.  System administrators who are comfortable with the underlying OS would naturally not see a NAS OS as a benefit and avoid it, for the most part.  It is uniquely the people for whom it is most dangerous to run a not fully understood storage platform that are likely to attempt it.  And, of course, most NAS OS vendors earn their money, as we could predict, on post-installation support calls for customers who deployed and got stuck once they were in production so that they are at the mercy of the vendors for exorbitant support pricing.  It is in the interest of the vendors to make it easy to install and hard to fix.  Everything is working against the IT pro here.

If we take a common example and look at FreeNAS we can see how this is a poor alignment of “difficulties.”  FreeNAS is FreeBSD with an additional interface on top.  Anything that FreeNAS can do, FreeBSD an do.  There is no loss of functionality by going to FreeBSD.  When something fails, in either case, the system administrator must have a good working knowledge of FreeBSD in order to exact repairs.  There is no escaping this.  FreeBSD knowledge is common in the industry and getting outside help is relatively easy.  Using FreeNAS adds several complications, the biggest being that any and all customizations made by the FreeNAS GUI are special knowledge needed for troubleshooting on top of the knowledge already needed to operate FreeBSD.  So this is a large knowledge set as well as more things to fail.  It is also a relatively uncommon knowledge set as FreeNAS is a niche storage product from a small vendor and FreeBSD is a major enterprise IT platform (plus all use of FreeNAS is FreeBSD use but only a tiny percentage of FreeBSD use is FreeNAS.)  So we can see that using a NAS OS just adds risk over and over again.

This same issue carries over into the communities that grow up around these products.  If you look to communities around FreeBSD, Linux or Windows for guidance and assistance you deal with large numbers of IT professionals, skilled system admins and those with business and enterprise experience.  Of course, hobbyists, the uninformed and others participate too, but these are the enterprise IT platforms and all the knowledge of the industry is available to you when implementing these products.  Compare this to the community of a NAS OS.  By its very nature, only people struggling with the administration of a standard operating system and/or storage basics would look at a NAS OS package and so this naturally filters the membership in their communities to include only the people from whom we would be best to avoid getting advice.  This creates an isolated culture of misinformation and misunderstandings around storage and storage products.  Myths abound, guidance often becomes reckless and dangerous and industry best practices are ignored as if decades of accumulated experience had never happened.

A NAS OS also, commonly, introduces lags in patching and updates.  A NAS OS will almost always and almost necessarily trail its parent OS on security and stability updates and will very often follow months or years behind on major features.  In one very well known scenario, OpenFiler, the product was built on an upstream non-enterprise base (RPath Linux) which lacked community and vendor support, failed and was abandoned leaving downstream users, included everyone on OpenFiler, abandoned without the ecosystem needed to support them.  Using a NAS OS means trusting not just the large, enterprise and well known primary OS vendor that makes the base OS but trusting the NAS OS vendor as well.  And the NAS OS vendor is orders of magnitude more likely to fail if they are basing their products on enterprise class base OSes.

Storage is a critical function and should not be treated carelessly and should not be ignored as if its criticality did not exist.  NAS OSes tempt us to install quickly and forget, hoping that nothing ever goes wrong or that we can move on to other roles or companies completely before bad things happen.  It sets us up for failure where failure is most impactful.  When a typical application server fails we can always copy the files off of its storage and start fresh.  When storage fails, data is lost and systems go down.

“John Hammond: All major theme parks have delays. When they opened Disneyland in 1956, nothing worked!

Dr. Ian Malcolm: Yeah, but, John, if The Pirates of the Caribbean breaks down, the pirates don’t eat the tourists.”

When storage fails, businesses fail.  Taking the easy route to setting up storage and ignoring the long term support needs and seeking advice from communities that have filtered out the experienced storage and systems engineers increases risk dramatically.  Sadly, the nature of a NAS OS, is that the very reason that people turn to it (lack of deep technical knowledge to build the systems) is the very reason they must avoid it (even greater need for support.)  The people for whom NAS OSes are effectively safe to use, those with very deep and broad storage and systems knowledge would rarely consider these products because for them they offer no benefits.

At the end of the day, while the concept of a NAS OS sounds wonderful, it is not a panacea and the value of a NAS does not carry over from the physical appliance world to the installed OS world and the value of standard OSes is far too great for NAS OSes to effectively add real value.

“Dr. Alan Grant: Hammond, after some consideration, I’ve decided, not to endorse your park.

John Hammond: So have I.”

The Information Technology Resource for Small Business