All posts by Scott Alan Miller

Started in software development with Eastman Kodak in 1989 as an intern in database development (making database platforms themselves.) Began transitioning to IT in 1994 with my first mixed role in system administration.

You Are Not Special

It is not my intention for this to sound harsh, but I think that it has to be said: “You are not special.”  And by “you” here, of course, I mean your business.  The organization that you, as an IT practitioner, support.  For decades we have heard complaints about how modern education systems attempt to make every student feel unique and special, when awards are given out schools attempt to find a way, especially with elementary students, to make sure that every student gets an award of some sort.  Awards for best attendance, posture, being quiet in class or whatever are created to award completely irrelevant things in order to make every student not only feel like part of the group, but to be a special, unique individual that has accomplished something better than anyone else.

This attitude, this belief that everyone is special and that all of those statistics, general rules and best practices apply to “someone else” has become pervasive in IT now as well, manifesting itself in the belief that each business, each company is so special and unique that IT industry knowledge does not apply in this situation.  IT practitioners with whom I have spoken almost always agree that best practices and accumulated industry knowledge are good and apply in nearly every case – except for their own.  All of those rules of thumb, all of those guidelines are great for someone else, but not for them.  The problem is that nearly everyone feels this way, but this cannot be the case.

I have found this problem to be most pronounced and, in fact, almost exclusive to the small business market where, in theory, the likelihood of a company being highly unique is actually much lower than the large enterprise space of the Fortune 100 where uniqueness is somewhat expected.  But instead of small businesses assuming uniformity and enormous businesses expecting uniqueness the opposite appears to happen.  Large businesses understand that even at massive scale IT problems are mostly standard patterns and by and large should be solved using tried and true, normal approaches.  And likewise, small businesses, seemingly driven by an emotional need to be “special” claim a need for avoiding industry patterns often eschewing valuable knowledge to a ludicrous degree and often while conforming to the most textbook example of the use case for the pattern.  It almost seems, from my experience, that the more “textbook” a small business is, the more likely that its IT department will avoid solutions designed exactly for them and attempt to reinvent the wheel at any cost.

Common solutions and practices apply to the majority of businesses and workloads, easily in excess of 99.9% of them.  Even in larger companies where there is opportunity for uniqueness we expect to only see rare workloads that fall into a unique category.  Even in the world’s largest businesses the average workload is, well, average.  Large enterprises with tens of thousands of servers and workloads often find themselves with a handful of very unique situations for which there is no industry standard to rely on.  But even so, they have many thousands of very standard workloads that are not special in any way.  The smaller the business not only the less opportunity for a unique workload but the less chance of it occurring on a workload by workload basis because they have so many fewer workloads.

One of the reasons that small businesses, even ones very unique as small businesses go, are rarely actually unique is because when a small business has an extreme need for say performance, capacity, scale or security it [almost] never means that it needs that thing in excess of existing standards for larger businesses.  The standards of how to deal with large data sets or extreme security, for example, are already well established in the industry at large and small businesses need only leverage the knowledge and practices developed for larger players.

What is surprising is when a small business with relatively trivial revenue believes that its data requires a level of secrecy and security in excess of the security standards of the world’s top financial institutions, military organizations, governments, hospitals or nuclear power facilities.  What makes the situation more absurd is that in pursuing these extremes of security, small businesses almost always result in very low security standards.  They often cite needs for “extreme security” for doing insecure or as we often say “tin foil hat” procedures.

Security is one area where this behavior if very pronounced.  Often it is small business owners or small business IT “managers” who create this feeling of distrusting industry standards, not IT practitioners themselves, although the feeling that a business is unique often trickles down and is seen there as well.

Similar to security, the need for unlimited uptime and highly available systems, rarely needed even for high end enterprise workloads, seem an almost ubiquitous goal in small businesses.  Small businesses often spend orders of magnitude more money, in relationship to revenue, on procuring high availability systems compared to their larger business counterparts.  Often this is done with the mistaken belief that large businesses always use high availability and that small business must do so to compete, that if they do not that they are not a viable business or that any downtime equates to business collapse.  None of these are true.  Enterprises have far lower cost of reliability compared to revenue and still do considerable cost analysis to see what reliability expenditures are justified through risk.  Small businesses rarely do that best practice analysis and jump, almost universally, to the very unlikely belief that their workloads are dramatically more valuable than even the largest enterprises and that they have no means of mitigating downtime.  Eschewing both business best practices (doing careful cost and risk analysis before investing in risk mitigation), financial best practices (erring on the side of up front cost savings) or technology best practices (high availability only when needed and justified) leaves many businesses operating from the belief that they are “special” and none of the normal rules apply to them.

By approaching all technology needs from the assumption of being special, businesses that do this are unable to leverage the vast, accumulated knowledge of the industry.  This means that businesses are continuously reinventing the wheel and attempting to forge new paths where well trodden, safe paths already exist.  Not only can this result in an extreme degree of overspending in some cases and in dangerous risk in others but it effectively guarantees that the cost of any project is unnecessarily high.  Small business, especially, have the extreme advantage of being able to leverage the research and experience of larger businesses allowing smaller businesses to be more agile and lean.  This is a key component to making small businesses compete against the advantages of scale inherent to large businesses.  When small businesses ignore this advantage they are left with neither the scale of big business nor the advantages of being small.

There is no simple solution here – small business IT practitioners and small business managers need to step down from their pedestals and take a long, hard look at their companies and ask if they really are unique and special or if they are a normal business with normal needs.  I guarantee you are not the first to face the problems that you have.  If there isn’t a standard solution approach available already then perhaps the approach to the problem is wrong itself.  Take a step back and evaluate with an eye to understanding that many businesses share common problems and can tackle them effectively using standard patterns, approaches and often best practices.  If your immediate reaction to best practices, patterns and industry knowledge is “yes but that doesn’t apply here” you need to stop and reevaluate – because yes, it certainly does apply to you.  It is almost certainly true that you have misunderstood the uniqueness of your business or you have misunderstood how the guidance is applied resulting in the feeling that those guidelines are not applicable.  Even those rare businesses with very unique workloads only have them for a small number of their workloads and not the majority of them; the most extremely unique businesses and organizations still have many common workloads.

Patterns and best practices are our friends and allies, our trusted partners in IT.  IT, and business in general, is challenging and complex.  To excel as IT practitioners we can seek to stand on the shoulders of giants, walk the paths that have been mapped and trodden for us and leverage the work of others to make our solutions as stable, predictable and supportable as possible.  This allows us to provide maximum value to the businesses that we support.

Explaining the Lack of Large Scale Studies in IT

IT practitioners ask for these every day and yet, none exist – large scale risk and performance studies for IT hardware and software.  This covers a wide array of possibilities, but common examples are failure rates between different server models, hard drives, operating systems, RAID array types, desktops, laptops, you name it.  And yet, regardless of the high demand for such data there is none available.  How can this be.

Not all cases are the same, of course, but by and large there are three really significant factors that come into play keeping this type of data from entering the field.  These are the high cost of conducting a study, the long time scale necessary for a study and a lack of incentive to produce and/or share this data with other companies.

Cost is by far the largest factor.  If the cost of large scale studies could be overcome, all other factors could have solutions found for them.  But sadly the nature of a large scale study is that it will be costly.  As an example we can look at server reliability rates.

In order to determine failure rates on a server we need a large number of servers in order to collect this data.  This may seem like an extreme example but server failure rates is one of the most commonly requested large scale study figures and so the example is an important one.  We would need perhaps a few hundred servers for a very small study but to get statistically significant data we would likely need thousands of servers.  If we assume that a single server is five thousand dollars, which would be a relatively entry level server, we are looking at easily twenty five million dollars of equipment!  And that is just enough to do a somewhat small scale test (just five thousand servers) of a rather low cost device.  If we were to talk about enterprise servers we would easily just to thirty or even fifty thousand dollars per server taking the cost even to a quarter of a billion dollars.

Now that cost, of course, is for testing a single configuration of a single model server.  Presumably for a study to be meaningful we would need many different models of servers.  Perhaps several from each vendor to compare different lines and features.  Perhaps many different vendors.  It is easy to see how quickly the cost of a study becomes impossibly large.

This is just the beginning of the cost, however.  To do a good study is going to require carefully controlled environments on par with the best datacenters to isolate environmental issues as much as possible.  This means highly reliable electric, cooling, airflow, humidity control, vibration and dust control.  Good facilities like this are very expensive and are why many companies do not pay for them, even for valuable production workloads.  In a large study this cost could easily exceed the cost of the equipment itself over the course of the study.

Then, of course, we must address the needs for special sensors and testing.  What exactly constitutes a failure?  Even in production systems there is often dispute on this.  Is a hard drive failing in an array a failure, even if the array does not fail?  Is predictive failure a failure? If dealing with drive failure in a study, how do you factor in human components such as drive replacement which may not be done in a uniform way?  There are ways to handle this, but they add complication and make the studies skew away from real world data to contrived data for a study.  Establishing study guidelines that are applicable and useful to end users is much harder than it seems.

And the biggest cost, manual labor.  Maintaining an environment for a large study will take human capital which may equal the cost of the study itself.  It takes a large number of people to maintain a study environment, run the study itself, monitor it and collect the data.  All in all, the cost are generally, simply impossible to do.

Of course we could greatly scale back the test, run only a handful of servers and only two or three models, but the value of the test rapidly drops and risks ending up with results that no one can use while still having spent a large sum of money.

The second insurmountable problem is time.  Most things need to be tested for failure rates over time and as equipment in IT is generally designed to work reliably for decades, collecting data on failure rates requires many years.  Mean Time to Failure numbers are only so valuable, Mean Time Between Failures and failure types, modes and statistics on that failure is very important in order for a study to be useful.  What this means is that for a study to be truly useful it must run for a very long time creating greater and greater cost.

But that is not the biggest problem.  The far larger issue is that for a study to have enough time to generate useful failure numbers, even if those numbers were coming out “live” as they happened it would already be too late.  The equipment in question would already be aging and nearing time for replacement in the production marketplace by the time the study was producing truly useful early results.  Often production equipment is only purchased for three to five years total lifespan.  Getting results even one year into this span would have little value.  And new products may replace those in the study even more rapidly than the products age naturally making the study only valuable from a historic context without any use in determining choices in a production decision role – the results would be too old to be useful by the time that they were available.

The final major factor is a lack of incentive to provide existing data to those who need it.  While few sources of data exists, a few do, but nearly all are incomplete and exist for large vendors to measure their own equipment quality, failure rates and such.  These are rarely done in controlled environments and often involve data collected from the field.  In many cases this data may even be private to customers and not legally able to be shared regardless.

But vendors who collect data do not collect it in an even, monitored way so sharing that data could be very detrimental to them because there is no assurance that equal data from their competitors would exist.  Uncontrolled statistics like that would offer no true benefit to the market nor do the vendors who have them so vendors are heavily incentivized to keep such data under tight wraps.

The rare exception are some hardware studies from vendors such as Google and BackBlaze who have large numbers of consumer class hard drives in relatively controlled environments and collect failure rates for their own purposes but have little or no risk from their own competitors leveraging that data but do have public relations value in doing so and so, occasionally, will release a study of hardware reliability on a limited scale.  These studies are hungrily devoured by the industry even though they generally contain relatively little value as their data is old and under unknown conditions and thresholds, and often do not contain statistically meaningful data for product comparison and, at best, contain general industry wide statistical trends that are somewhat useful for predicting future reliability paths at best.

Most other companies large enough to have internal reliability statistics have them on a narrow range of equipment and consider that information to be proprietary, a potential risk if divulged (it would give out important details of architectural implementations) and a competitive advantage.  So for these reasons they are not shared.

I have actually been fortunate enough to have been involved and run a large scale storage reliability test that was conducted somewhat informally, but very valuably on over ten thousand enterprise servers over eight years resulting in eighty thousand server years of study, a rare opportunity.  But what was concluded in that study was that while it was extremely valuable what it primarily showed is that on a set so large we were still unable to observe a single failure!  The lack of failures was, itself, very valuable.  But we were unable to produce any standard statistic like Mean Time to Failure.  To produce the kind of data that people expect we know that we would have needed hundreds of thousands of server years, at a minimum, to get any kind of statistical significance but we cannot reliably state that even that would have been enough.  Perhaps millions of servers years would have been necessary.  There is no way to truly know.

Where this leaves us is that large scale studies in IT simply do not exist and will never, likely, exist.  When they do they will be isolated and almost certainly crippled by the necessities of reality.  There is no means of monetizing studies on the scale necessary to be useful, mostly because failure rates of enterprise gear is so low while the equipment is so expensive, so third party firms can never cover the cost of providing this research.  As an industry we must accept that this type of data does not exist and actively pursue alternatives to having access to such data.  It is surprising that so many people in the field expect this type of data to be available when it never has been historically.

Our only real options, considering this vacuum, are to collect what anecdotal evidence exists (a very dangerous thing to do which requires careful consideration of context) and the application of logic to assess reliability approaches and techniques.  This is a broad situation where observation necessarily fails us and only logic and intuition can be used to fill the resulting gap in knowledge.

Practical RAID Choices for Spindle Based Arrays

A truly monumental amount of information abounds in reference to RAID storage systems exploring topics such as risk, performance, capacity, trends, approaches and more.  While the work on this subject is nearly staggering the information can be distilled into a handful of common, practical storage approaches that will cover nearly all use cases.  My goal here is to provide a handy guide that will allow a non-storage practitioner to approach RAID decision making in a practical and, most importantly, safe way.

For the purposes of this guide we will assume storage projects of no more than twenty five traditional drives (spinning platter drives properly known as Winchester drives.)  These drives could be SFF (2.5″) or LFF (3.5″) commonly, SATA or SAS, consumer or enterprise.  We will not tackle solid state drives as these have very different characteristics and require their own guidance.  Storage systems larger than roughly twenty five spindles should not work from standard guidance but delve deeper into specific storage needs to ensure proper planning.

The guidance here is written for standard systems in 2015.  Over the past two decades the common approaches to RAID storage have changed dramatically and while it is not anticipated that the key factors that influence these decisions will change enough in the future to alter these recommendations it is very possible that they will.  Good RAID design of 1998 is very poor RAID design today.  The rate of change in the industry has dropped significantly since that time and these recommendations are likely to stand for a very long time, very possibly until spindle-based drive storage is no longer available or at least popular, but like all things predictions are subject to great change.

In general we use what is termed a “One Big Array” approach.  That is a single RAID array on which all system and data partitions are created.  The need or desire to split our storage into multiple, physical arrays is mostly gone today and should only be done in non-general circumstances.  Only in situations where careful study of the storage needs and heavy analysis are being done should we look at array splitting.  Array splitting is far more likely to cause harm rather than good.  When it doubt, avoid split arrays.  The goal of this guide is general rules of thumb to allow any IT Pro to build a safe and reliable storage system.  Rules of thumb do not and can not cover every scenario, exceptions always exist.  But the idea here is to cover the vast majority of cases with tried and true approaches that are designed around modern equipment, use cases and needs while being mindful to err on the side of safety – when a choice is less than ideal it is still safe.  None of these choices is at all reckless, at worst they are overly conservative.

The first scenario we should consider is if your data does not matter.  This may sound like an odd thing to consider but it is a very important scenario.  There are many times where data saved to disk is considered ephemeral and does not need to be protected.  This is common for reconstructable data such as working space for rendering, intermediary calculation spaces or caches – situations where spending money to protect data is wasted and it would be acceptable to simply recreate lost data rather than protecting it.  This could be a case where downtime is not a problem and data is static or nearly so and rather than spending to reduce downtime we only worry about protecting the data via backup mechanisms so that if an array fails we simply restore the array completely.  In these cases the obvious choice is RAID 0.  It is very fast, very simple and provides the most cost effective capacity.  The only downside of RAID 0 is that it is fragile and provides no protection against data loss in case of drive failure or even a URE (which would cause data corruption the same as a desktop drive faces.)

It should be noted that an exception to the “One Big Array” approach that would be common is in systems using RAID 0 for data.  There would be a very good argument made for a small drive array dedicated to the OS and application data that would be cumbersome to reinstall in case of array loss being kept on RAID 1 and the RAID 0 data array being separate from it.  This way recovery could be very rapid rather than needing to completely rebuild the entire system from scratch rather than simply recreating the data.

Assuming that we have eliminated cases where the data does not require protection, we will assume for all remaining cases that the data is quite important and we want to protect it at some cost.  We will assume that protecting the data as it exists on the live storage is important, generally because we want to avoid downtime or because we want to ensure data integrity because the data on disk is not static and an array failure would also constitute data loss.  With this assumption we will continue.

If we have an array of only two disks the answer is very simple, we choose RAID 1.  There is no other option at this size, so no decision to be made.   In theory we should be planning our arrays holistically and not after the number of drives is determined, the number of drives and the type of array chosen should be done together not drives purchased then use determined based on that arbitrary number, but two drive chassis are so common that it is worth mentioning as a case.

Likewise, with a four drive array the only real choice to consider is RAID 10.  There is no need for further evaluation.  Simply select RAID 10 and continue.

An awkward case is a three drive array.  It is very, very rare that we are limited to three drives as the only common chassis limited to three drives was the Apple Xserve and this has been off of the market for some time so the need to deal with decision making around three spindle arrays should be extremely unlikely.  In cases where we have three drives it is often best to seek guidance but the most common approaches are to add a fourth drive and ergo chose RAID 10 or, if capacity of greater than a single drive’s worth is not needed, to put all three drives into a single triple-mirror RAID 1.

For all other cases, therefore, we are dealing with five to twenty five drives.  Since we have eliminated the situations where RAID 0 and RAID 1 would apply we are left with all common scenarios coming down to RAID 6 and RAID 10, and these constitute the vast majority of cases.  Choosing between RAID 6 and RAID 10 becomes the biggest challenge that we will face as we must look solely at a our “soft” needs of reliability, performance and capacity.

Choosing between RAID 6 and RAID 10 should not be incredibly difficult.  RAID 10 is ideal for situations where performance and safety are the priorities.  RAID 10 has much faster write performance and is safe regardless of disk type used (low cost consumer disks can still be extremely safe, even in large arrays.)  RAID 10 scales well to extremely large sizes, much larger than should be implemented using rules of thumb!  RAID 10 is the safest of all choices, it is fast and safe.  The obvious downsides are that RAID 10 has less storage capacity from the same disks and is more costly on the basis of capacity. It must be mentioned that RAID 10 can only utilize an even number of disks, disks are added in pairs.

RAID 6 is generally safe and fast but never as safe or as fast as RAID 10.  RAID 6 specifically suffers from write performance so is very poorly suited for workloads such as databases and heavily mixed loads like in large virtualization systems.  RAID 6 is cost effective and provides a heavy focus on available capacity compared to RAID 10.  When budgets are tight or capacity needs dominate over performance RAID 6 is an ideal choice.  Rarely is the difference in safety between RAID 10 and RAID 6 a concern except in very large systems with consumer class drives.  RAID 6 is subject to additional risk with consumer class drives that RAID 10 is not affected by which could warrant some concern around reliability in larger RAID 6 systems such as those above roughly 40TB when consumer drives are used.

In the small business space especially, the majority of systems will use RAID 10 simply because arrays rarely need to be larger than four drives.  When arrays are larger RAID 6 is the more common choice due to somewhat tight budgets and generally low concern around performance.  Both RAID 6 and RAID 10 are safe and effective solutions for nearly all usage scenarios with RAID 10 dominating when performance or extreme reliability are key and RAID 6 dominating when cost and capacity are key.  And, of course, when storage needs are highly unique or very large, such as larger than twenty five spindles in an array, remember to leverage a storage consultant as the scenario can easily become very complex.  Storage is one place where it pays to be extra diligent as so many things depend upon it, mistakes are so easy to make and the flexibility to change it after the fact is so low.

Slow OS Drives, Fast Data Drives

Over the years I have found that people often err on the side of high performance, highly reliable data storage for an operating system partition but choose slow, “cost effective” storage for critical data stores.  I am amazed by how often I find this occurring and now, with the advent of hypervisors, I see the same behaviour being repeated there as well – compounding the previously existing issues.

In many systems today we deal with only a single storage array shared by all components of the system.  In these cases we do not face the problem of misbalancing our storage system performance.  This is one of the big advantages of this approach and a major reason why it comes so highly recommended.  All performance is in a shared pool and the components that need the performance have access to it.

In many cases, whether in an attempt at increased performance or reliability design or out of technical necessity, I find that people are separating out their storage arrays and putting hypervisors and operating systems on one array and data on another.  But what I find shocking is that arrays dedicated to the hypervisor or operating system are often staggeringly large in capacity and extremely high in performance – often involving 15,000 RPM spindles or even solid state drives at great expense.  Almost always in RAID 1 (as per common standards from 1998.)

What needs to be understood here is that operating systems themselves have effectively no storage IO requirements.  There is a small amount, mostly for system logging, but that is about all that is needed.  Operating system partitions are almost completely static.  Required components are loaded into memory, mostly at boot time, and are not accessed again.  Even in cases where logging is needed, many times these logs are sent to a central logging system and not to the system storage area reducing or even removing that need as well.

With hypervisors this effect is even more extreme.  As hypervisors are far lighter and less robust than traditional operating systems they behave more like embedded systems and, in many ways, actually are embedded systems in many cases.  Hypervisors load into memory at system boot time and their media is almost never needed again while a system is running except for logging on some occasions.  Because hypervisors are small in physical size even the total amount of time needed to completely read a full hypervisor off of storage is very small, even on very slow media, because the total size is very small.

For these reasons, storage performance is of little to no consequence for operating systems and especially hypervisors.  The difference between fast storage and slow storage really only impacts system boot time where the difference in one second or thirty seconds rarely would be noticed, if at all.  When would anyone perceive even several extra seconds during the startup of a system and in most cases, startups are rare events happening at most once a week during an automated, routine system reboot during a planned maintenance window or very rarely, sometimes only once every several years, for systems that are only brought offline in emergencies.  Even the slowest conceivable storage system is far faster than necessary for this role.

Even slow storage is generally many times faster than is necessary for system logging activities.  In those rare cases where logging is very intense we have many choices of how to tackle this problem.  The most obvious and common solution here is to send logs to a drive array other than the one used by the operating system or hypervisor.  This is a very easy solution and ultimately very practical in cases where it is warranted.  The other common and highly useful solution is to simply refrain from keeping logs on the local device at all and send them to a remote log collection utility such as Splunk, Loggly or ELK.

The other major concern that most people have around their operating systems and hypervisors is reliability.  It is common to focus more efforts on protecting these relatively unimportant aspects of a system rather than the often irreplaceable data.  However, operating systems and hypervisors are easily rebuilt from scratch when necessary using fresh installs and manual reconfiguration when necessary.  The details which could be lost are generally relatively trivial to recreate.

This does not mean that these system filesystems should not be backed up, of course they should (in most cases.)  But just in case the backups fail as well, it is rare that the loss of an OS partition or filesystem truly spells tragedy but only an inconvenience.  There are ways to recover in nearly all cases without access to the original data, as long as the “data” filesystem is separate.  And because of the nature of operating systems and hypervisors, change is rare so backups can generally be less frequent, possibly triggered manually only when updates are applied!

With many modern systems in the DevOps and Cloud computing spaces it has become very common to view operating systems and hypervisor filesystems as completely disposable since they are defined remotely via a system image or by a configuration management system.  In these cases, which are becoming more and more common, there is no need for data protection or backups as the entire system is designed to be recreated, nearly instantly, without any special interaction.  The system is entirely self-replicating.  This further trivializes the need for system filesystem protection.

Taken together, the lack of need around performance and the lack of need around protection and reliability handled primarily through simple recreation and what we have is a system filesystem with very different needs than we commonly assume.  This does not mean that we should be reckless with our storage, we still want to avoid storage failure while a system is running and rebuilding unnecessarily is a waste of time and resources even if it does not prove to be disastrous.  So striking a careful balance is important.

It is, of course, for these reasons that including the operating system or hypervisor on the same storage array as data is now common practice – because there is little to no need for storage access to the system files at the same time that there is access to the data files so we get great synergy by getting fast boot times for the OS and no adverse impact on data access times once the system is online.  This is the primary means by which system designers today tackle the need for efficient use of storage.

When the operating system or hypervisor must be separated from the arrays holding data which can still happen for myriad reasons we generally seek to obtain reasonable reliability at low cost.  When using traditional storage (local disks) this means using small, slow, low cost spinning drives for operating system storage, generally in simple RAID 1 configuration.  A real world example is the use of 5400 RPM “eco-friendly” SATA drives in the smallest sizes possible.  These draw little power and are very inexpensive to acquire.  SSDs and high speed SAS drives would be avoided as they cost a premium for protection that is irrelevant and performance that is completely wasted.

In less traditional storage it is common to use a low cost, high density SAN consolidating the low priority storage for many systems onto shared, slow arrays that are not replicated. This is only effective in larger environments that can justify the additional architectural design and can achieve enough density in the storage consolidation process to create the necessary cost savings but in larger environments this is relatively easy.  SAN boot devices can leverage very low cost arrays across many servers for cost savings.  In the virtual space this could mean a low performance datastore used for OS virtual disks and another, high performance pool, for data virtual disks.  This would have the same effect as the boot SAN strategy but in a more modern setting and could easily leverage the SAN architecture under the hood to accomplish it.

Finally, and most dramatically, it is a general rule of thumb with hypervisors to install them to SD cards or USB thumb drives rather than to traditional storage as their performance and reliability needs are so much less even than traditional operating systems.  Normally if a drive of this nature were to fail while a system was running it would actually remain running without any problem as the drive is never used once the system has booted initially.  It would only be during a reboot that an issue would be found and, at that time, a backup boot device could be used such as a secondary SD card or USB stick.  This is the official recommendation for VMware vSphere, is often recommended by Microsoft representatives for HyperV and is officially supported through HyperV’s OEM vendors and is often recommended, but not so broadly supported, for Xen, XenServer and KVM systems.  Using SD cards or USB drives for hypervisor storage effectively turns a virtualization server into an embedded system.  While this may feel unnatural to system administrators who are used to thinking of traditional disks as a necessity for servers, it is important to remember that enterprise class, highly critical systems like routers and switches last decades and use this exact same strategy for the exact same reasons.

A common strategy for hypervisors in this embedded style mode with SD cards or USB drives is to have two such devices, which may actually be one SD card and one USB drive, each with a copy of the hypervisor.  If one device fails, booting to the second device is nearly as effective as a traditional RAID 1 system.  But unlike most traditional RAID 1 setups, we also have a relatively easy means of testing system updates by only updating one boot device at a time and testing the process before updating the second boot device leaving us with a reliable, well tested fall back in case a version update goes awry.  This process was actually common on large UNIX RISC systems where boot devices were often local software RAID 1 sets that supported a similar practice, especially common in AIX and Solaris circles.

It should also be noted that while this approach is the best practice for most hypervisor scenarios there is actually no reason why it cannot be applied to full operating system filesystems too, except that it is often more work.  Some OSes, especially Linux and BSD are very adept at being installed in an embedded fashion and can easily be adapted for installation on SD card or USB drive with a little planning.  This approach is not at all common but there is no technical reason why, in the right circumstances, it would not be an excellent approach except for the fact that almost never should an OS be installed to physical hardware rather than on top of a hypervisor.  In those cases where physical installs are necessary then this approach is extremely valid.

When designing and planning for storage systems, remember to be mindful as to what read and write patterns will really look like when a system is running. And remember that storage has changed rather dramatically since many traditional guidelines were developed and not all of the knowledge used to develop them still applies today or applies equally.  Think about not only which storage subsystems will attempt to use storage performance but also how they will interact with each other (for example, do two systems never request storage access at the same time or will they conflict regularly) and whether or not their access performance is important.  General operating system functions can be exceedingly slow on a database server without negative impact, all that matters is the speed at which a  database can be accessed.  Even access to application binaries is often irrelevant as they too, once loaded into memory, remain there and only memory speed impacts ongoing performance.

None of this is meant to suggest that separating OS and data storage subsystems from each other is advised, it often is not.  I have written in the past about how consolidating these subsystems is quite frequently the best course of action and that remains true now.  But there are also many reasonable cases where splitting certain storage needs from each other makes sense, often when dealing with large scale systems where we can lower cost by dedicating high cost storage to certain needs and low cost storage to other needs and it is in those cases where I want to demonstrate that operating systems and hypervisors should be considered the lowest priority in terms of both performance and reliability except in the most extreme cases.