One can hardly swing a proverbial cat in IT these days without hearing people talking about DevOps. DevOps is the hot new topic in the industry picking up from where the talk of cloud left off and to hear people talk about it one might believe that traditional systems administration is already dead and buried.

First we must talk about what we mean by DevOps. This can be confusing because, like cloud, an older term is often being stolen to mean something different or, at best, related to something that already existed. Traditional DevOps was the merging of developer and operational roles. In the 1960s through the 1990s, this was the standard way of running systems. In this world the people who wrote the software were generally the same ones who deployed and maintained it. Hence the merging of “developer” and “operations”, operations being a semi-standard term for the role of system administrator. These roles were not commonly separated until the rise of the “IT Department” in the 1990s and the 2000s. Since then, the return to the merging of the two roles has started to rise in popularity again primarily because of the way that the two can operate together with great value in many modern, hosted, web application situations.

Where DevOps is often talked about today is not as a strict merging of the developers and the operations staff but as a modification to the operations staff with a much higher focus on coding not the application itself but in defining application infrastructures as code as a natural extension of cloud architectures. This can be rather confusing at first. What is important to note is that traditional DevOps is not what is commonly occurring today but a new “fake” DevOps where developers remain developers and operations remains operations but operations has evolved into a new “code heavy” role that continues to focus on managing servers running code provided by the developers.

What is significant today is that the role of the system administrator has begun to diverge into two related, but significantly different roles, one of which is improperly called DevOps by most of the industry today (most of the industry being too young to remember when DevOps was the norm, not the exception and certainly not something new and novel.) I refer to these two aspects of the system administrator role here as the DevOps and the Snowflake approaches.

I use the term Snowflake to refer to traditional architectures for systems because each individual server can be seen as a “unique Snowflake.” They are all different, at least insofar as they are not somehow managed in such a way as to keep them identical. This doesn’t mean that they have to be all unique, just that they retain the potential to be. In traditional environments a system administrator will log into each server individually to work on them. Some amount of scripting is common to ease administration tasks but at its core the role involves a lot of time working on individual systems.

Easing administration of Snowflake architectures often involved attempts to minimize differences between systems in reasonable ways. This generally starts with things like choosing a single standard operating system and version (Windows 2012 R2 or Red Hat Enterprise Linux 7) rather than allowing every server installation to be a different OS or version. This standardization may seem basic but many shops lack this standardization even today.

A next step is commonly creating a standard deployment methodology or a gold master image that is used for making all systems so that the base operating system and all base packages, often including system customization, monitoring packages, security packages, authentication configuration and similar modifications are standard and deployed uniformly. This provides a common starting point for all systems to minimize divergence. But technically they only ensure a standard starting point and over time divergence in configuration must be anticipated.

Beyond these steps, Snowflake environments typically use custom, bespoke administration scripts or management tools to maintain some standardization between systems over time. The more commonalities that exist between systems the easier they are to maintain and troubleshoot and the less knowledge is needed by the administration staff. More standardization means fewer surprises, fewer unknowns and much better testing capabilities.

In a single system administrator environment with good practices and tooling, Snowflake environments can take on a high degree of standardization. But in environments with many system administrators, especially those supported around the clock from many regions, and with a large number of systems, standardization, even with very diligent practices, can become very difficult. And that is even before we tackle the obvious issues surrounding the fact that different packages and possibly package versions are needed on systems that perform different roles.

The DevOps approach grows organically out of the cloud architecture model. Cloud architecture is designed around automatically created and automatically destroyed, broadly identical systems (at least in groups) that are controlled through a programmatic interface or API. This model lends itself, quite obviously, to being controlled centrally through a management system rather than through the manual efforts of a system administrator. Manual administration is effectively impossible and completely impractical under this model. Individual systems are not unique like in the Snowflake model and any divergence will create serious issues.

The idea that has emerged from the cloud architecture world is one that systems architecture should be defined centrally “in code” rather than on the servers themselves. This sounds confusing at first but makes a lot of sense when we look at it more deeply. In order to support this model a new type of systems management tool that has yet to take on a really standard name but is often called a systems automation tool, DevOps framework, IT automation tool or simply “infrastructure as code” tool has begun to emerge. Common toolsets in this realm include Puppet, Chef, CFEngine and SaltStack.

The idea behind these automation toolsets is that a central service is used to manage and control all systems. This central authority manages individual servers by way of code-based descriptions of how the system should look and behave. In the Chef world, these are called “recipes” to be cute but the analogy works well. Each system’s code might include information such as a list of which packages and package versions should be installed, what system configurations should be modified and files to be copied to the box. In many cases decisions about these deployments or modifications are handled through potentially complex logic and hence the need for actual code rather than something more simplistic such as markup or templates. Systems are then grouped by role and managed as groups. The “web server” role might tell a set of systems to install Apache and PHP and configure memory to swap very little. The “SQL Server” role might install MS SQL Server and special backup tools only used for that application and configure memory to be tuned as desired for a pool of SQL Server machines. These are just examples. Typically an organization would have a great many roles, some may be generic such as “web server” and others much more specific to support very specific applications. Roles can generally be layered, so a system might be both a “web server” and a “java server” getting the combined needs of both met.

These standard definitions mean that systems, once designated as belonging to one role or another, can “build themselves” automatically. A new system might be created by an administrator requesting a system or a capacity monitoring system might decide that additional capacity is needed for a role and spawn new server instances automatically without any human intervention whatsoever. At the time that the system is requested, by a human or automatically, the role is designated and the system will, by way of the automation framework, transform itself into a fully configured and up to date “node.” No human system administration intervention required. The process is fast, simple and, most importantly, completely repeatable.

Defining systems in code has some non-obvious consequences. One is that backups of complete systems are no longer needed. Why backup a system that you can recreate, with minimum effort, almost instantly? Local data from database systems would need to be backed up but only the database data, not the entire system. This can greatly reduce strain on backup infrastructures and make restore processes faster and more reliable.

The amount of documentation needed for systems already defined in code is very minimal. In Snowflake environments the system administrator needs to maintain documentation specific to every host and maintain that documentation manually. This is very time consuming and error prone. Systems defined by way of central code need little to no documentation and the documentation can be handled at a group level, not the individual node level.

Testing systems that are defined in code is easy to do as well. You can create a system via code, test it and know that when you move that definition into production that the production system will be created repeatably exactly as it was created in testing. In Snowflake environments it is very common to have testing practices that attempt to do this but do so through manual efforts and are prone to being sloppy and not exactly repeatable and very often politics will dictate that it is faster to mimic repeatability than to actually strive for it. Code defined systems bypass these problems making testing far more valuable.

Outside of needing to define a number of nodes to exist within each role, the system can reprovision an entire architecture, from scratch, automatically. Rebuilding after a disaster or bringing up a secondary site can be very quickly and easily done. Also moving between locally hosted systems and remotely hosted systems including those from companies like Amazon, Microsoft, IBM, Rackspace and others is extremely easy.

Of course, in the DevOps world there is a great value to using cloud architectures to enable the most extreme level of automation but using cloud architectures is unnecessary to leverage these types of tools. And, of course, having a code defined architecture could be used partially while manual administration could be implemented too for a hybrid approach but this is rarely recommended on individual systems. It is generally far better to have two environments, one that is managed as Snowflakes and one that is managed as DevOps when the two approaches are mandated. This makes a far better hybridization. I have seen this work extremely well in an enterprise environment with more scores of thousands of “Snowflake” servers each very unique but with a dedicated environment of ten thousands nodes that was managed in a DevOps manner because all of the nodes were to be identical and interchangeable using one of two possible configurations. Hybridization was very effective.

The DevOps approach, however, comes with major caveats as well. The skill sets necessary to manage a system in this way are far greater than those needed for traditional systems administration as, at a minimum, all traditional systems administration knowledge is still needed plus solid programming knowledge typically of modern languages like Python and Ruby and knowledge of the specific frameworks in question as well. This extended knowledge base requirement means that DevOps practitioners are not only rare but expensive too. It also means that university education, already far short of preparing either systems administrators or developers for the professional world are now farther still from preparing graduates to work under a DevOps model.

System administrators working in each of these two camps have a tendency to see all systems as needing to fit into their own mold. New DevOps practitioners often believe that Snowflake systems are legacy and need to be updated. Snowflake (traditional) admins tend to see the “infrastructure as code” movement as silly, filled with unnecessary overhead, overly complicated and very niche.

The reality is that both approaches have a tremendous amount of merit and both are going to remain extremely viable. Both make sense for very different workloads and large organizations, I suspect, will commonly see both in place via some form of hybridization. In the SMB market where there are typically only a tiny number of servers, no scaling leverage to justify cloud architectures and a high disparity between systems, I suspect that DevOps will remain almost indefinitely outside of the norm as the overhead and additional skills necessary to make it function are impractical or even impossible to acquire. Larger organizations have to look at their workloads. Many traditional workloads and much of traditional software is not well suited to the DevOps approach, especially cloud automation, and will either require hybridization or an impractically high level of coding on a per system basis making the DevOps model impossible to justify. But workloads built on web architectures or that can scale horizontally extremely well will benefit heavily from the DevOps model at scale. This could apply to large enterprise companies or smaller companies likely producing hosted applications for external consumption.

This difference in approach means that, in the United States for example, most of the US is comprised of companies that will remain focused on the Snowflake management model while some east coast companies could evaluate the DevOps model effectively and begin to move in that direction. But on the west coast where more modern architectures and a much larger focus on hosted applications and applications for external consumption are the driving economic factors, DevOps is already moving from newcomer to mature, established normalcy. DevOps and Snowflake approaches will likely remain heavily segregated by regions in this way just as IT, in general, sees different skill sets migrate to different regions. It would not be surprising to see DevOps begin to take hold in markets such as Austin where traditional IT has performed rather poorly.

Neither approach is better or worse, they are two different approaches servicing two very different ways of provisioning systems and two different fundamental needs of those systems. With the rise of cloud architectures and the DevOps model, however, it is critically important that existing system administrators understand what the DevOps model means and when it applies so that they can correctly evaluate their own workloads and unique needs. A large portion of the traditional Snowflake system administration world will be migrating, over time, to the DevOps model. We are very far from reaching a steady state in the industry as to the balance of these two models.

Originally published on the StorageCraft Blog.

Storage

Practical RAID Performance

January 5, 2015 Scott Alan Miller 1 Comment

Choosing a RAID level is an exercise in balancing many factors including cost, reliability, capacity and, of course, performance. RAID performance can be difficult to understand especially as different RAID levels use different techniques and behave rather differently from each other in some cases. In this article I want to explore the common RAID levels of RAID 0, 5, 6 and 10 to see how performance differs between them.

For the purposes of this article, RAID 1 will be assumed to be a subset of RAID 10. This is often a handy way to think of RAID 1 – as simply being a RAID 10 array with only a single mirrored pair member. As RAID 1 is truly a single pair RAID 10 and behaves as such this works wonderfully for making RAID performance easy to understand as it simply maps into the RAID 10 performance curve.

There are two types of performance to look at with all storage: reading and writing. In terms of RAID reading is extremely easy and writing is rather complex. Read performance is effectively stable across all RAID types. Writing, however, is not.

To make discussing performance easier we need to define a few terms as we will be working with some equations. In our discussions we will use N to represent the total number of drives, often referred to as spindles, in our array and we will use X to refer to the performance of each drive individually. This allows us to talk in terms of relative performance as a factor of the drive performance allowing us to abstract away the RAID array and not have to think in terms of raw IOPS. This is important as IOPS are often very hard to define but we can compare performance in a meaningful way by speaking to it in relationship to the individual drives within the array.

It is also important to remember that we are only talking about the performance of the RAID array itself, not an entire storage subsystem. Artifacts such as memory caches and solid state caches will do amazing things to alter the overall performance of a storage subsystem, but do not fundamentally change the performance of the RAID array itself under the hood. There is no simple formula for determining how different cache options will impact the overall performance but suffice it to say that it can be very dramatic but this depends heavily not only on the cache choices themselves but also heavily upon workload. Even the biggest, fastest, most robust cache options cannot change the long term, sustained performance of an array.

RAID is complex and many factors influence the final performance. One is the implementation of the RAID system itself. A poor implementation might cause latency or may fail to take advantage of the available spindles (such as having a RAID 1 array read only from a single disk instead of from both simultaneously!) There is no easy way to account for deficiencies in specific RAID implementations so we must assume that all are working to the limits of the specification as, indeed, any enterprise RAID system will do. It is primarily hobby and consumer RAID systems that fail to do this.

Some types of RAID also have dramatic amounts of computational overhead associated with them while others do not. Primarily parity RAID levels require heavy processing in order to handle write operations with different levels having different amounts of computation necessary for each operation. This introduces latency, but does not curtail throughput. This latency will vary, however, based on the implementation of the RAID level as well as on the processing capability of the system in question. Hardware RAID will use something like a general purpose CPU (often a Power or ARM RISC processor) or a custom ASIC to handle this while software RAID hands this off to the server’s own CPU. Often the server CPU is actually faster here but consumes system resources. ASICs can be very fast but are expensive to produce. This latency impacts storage performance but is very difficult to predict and can vary from nominal to dramatic. So I will mention the relative latency impact with each RAID level but will not attempt to measure it. In most RAID performance calculations, this latency is ignored but it is important to understand that it is present and could, depending on the configuration of the array, have a noticeable impact on a workload.

There is, it should be mentioned, a tiny performance impact to read operations due to efficiencies in the layout of data on the disk itself. Parity RAID requires there to be data on the disks that is useless during a healthy read operation but cannot be used to speed it up. The actually results in it being slightly slower. But this impact is ridiculously small and is normally not measured and so can be ignored.

Factors such as stripe size also impact performance, of course, but as that is configurable and not an intrinsic artifact in any RAID level I will ignore it here. It is not a factor when choosing a RAID level itself but only in configuring one once chosen.

The final factor that I want to mention is the read to write ratio of storage operations. Some RAID arrays will be used almost purely for read operations, some almost solely for write operations but most use a blend of the two, likely something like eighty percent read and twenty percent write. This ratio is very important in understanding the performance that you will get from your specific RAID array and understanding how each RAID level will impact you. I refer to this as the read/write blend.

We measure storage performance primarily in IOPS. IOPS stands for Input/Output Operations Per Second (yes, I know that the letters don’t line up well, it is what it is.) I further use the terms RIOPS for Read IOPS, WIOPS for Write IOPS and BIOPS for Blended IOPS which would come with a ration 80/20 or whatever. Many people talk about storage performance with a single IOPS number. When this is done they normally mean Blended IOPS at 50/50. However, rarely does any workload run at 50/50 so that number can be extremely misleading. Two numbers, RIOPS and WIOPS is what is needed to understand performance and these two together can be used to find any IOPS Blend that is needed. For example, a 50/50 blend is as simple as (RIOPS * .5) + (WIOPS * .5). The more common 80/20 blend would be (RIOPS * .8) + (WIOPS * .2).

Now that we have established some criteria and background understanding we will delve into our RAID levels themselves and see how performance varies across them.

For all RAID levels, the Read IOPS number is calculated using NX. This does not address the nominal overhead numbers that I mention above, of course. This is a “best case” number but the real world number is so close that it is very practical to simply use this formula. Since take the number of spindles (N) and multiple by the IOPS performance of an individual drive (X). Keep in mind that drives often have different read and write performance so be sure to use the drives Read IOPS rating or tested speed for the Read IOPS calculation and the Write IOPS rate or tested speed for the Write IOPS calculation.

RAID 0

RAID 0 is the easiest RAID level to understand because there is effectively no overhead to worry about, no resources consumed to power it and both read and write get the full benefit of every spindle, all of the time. So for RAID 0 our formula for write performance is very simple: NX. RAID 0 is always the most performant RAID level.

An example would be an eight spindle RAID 0 array. If an individual drive in the array delivers 125 IOPS then our calculation would be from N = 8 and X = 125 so 8 * 125 yielding 1,000 IOPS. Since both read and write IOPS are the same here, it is extremely simple as we get 1K RIOPS, 1K WIOPS and 1K with any blending thereof. Very simple. If we didn’t know the absolute IOPS of an individual spindle we could refer to an eight spindle RAID 0 as delivering 8X Blended IOPS.

RAID 10

RAID 10 has the second simplest RAID level to calculate. Because RAID 10 is a RAID 0 stripe of mirror sets, we have no overhead to worry about from the stripe but each mirror has to write the same data twice in order to create the mirroring. This cuts our write performance in half compared to a RAID 0 array of the same number of drives. Giving us a write performance formula of simply: NX/2 or .5NX.

It should be noted that at the same capacity, rather than the same number of spindles, RAID 10 has the same write performance as RAID 0 but double the read performance – simply because it requires twice as many spindles to match the same capacity.

So an eight spindle RAID 10 array would be N = 8 and X = 125 and our resulting calculation comes out to be (8 * 125)/2 which is 500 WIOPS or 4X WIOPS. A 50/50 blend would result in 750 Blended IOPS (1,000 Read IOPS and 500 Write IOPS.)

This formula applies to RAID 1, RAID 10, RAID 100 and RAID 01 equally.

Uncommon options such as triple mirroring in RAID 10 would alter this write penalty. RAID 10 with triple mirroring would be NX/3, for example.

RAID 5

While RAID 5 is deprecated and should never be used in new arrays I include it here because it is a well known and commonly used RAID level and its performance needs to be understood. RAID 5 is the most basic of the modern parity RAID levels. RAID 2, 3 & 4 are no longer found in production systems and so we will not look into their performance here. RAID 5, while not recommended for use today, is the foundation of other modern parity RAID levels so is important to understand.

Parity RAID adds a somewhat complicated need to verify and re-write parity with every write that goes to disk. This means that a RAID 5 array will have to read the data, read the parity, write the data and finally write the parity. Four operations for each effective one. This gives us a write penalty on RAID 5 of four. So the formula for RAID 5 write performance is NX/4.

So following the eight spindle example where the write IOPS of an individual spindle is 125 we would get the following calculation: (8 * 125)/4 or 2X Write IOPS which comes to 250 WIOPS. In a 50/50 blend this would result in 625 Blended IOPS.

RAID 6

RAID 6, after RAID 10, is probably the most common and useful RAID level in use today. RAID 6, however, is based off of RAID 5 and has another level of parity. This makes it dramatically safer than RAID 5, which is very important, but also imposes a dramatic write penalty as each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity. This comes out to be a six times write penalty, which is pretty dramatic. So our formula is NX/6.

Continuing our example we get (8 * 125)/6 which comes out to ~167 Write IOPS or 1.33X. In our 50/50 blend example this is a performance of 583.5 Blended IOPS. As you can see, parity writes cause a very rapid decrease in write performance and a noticeable drop in blended performance.

RAID 7 (aka RAID 5.3 or RAID 7.3)

RAID 7 is a somewhat non-standard RAID level with triple parity based off of the existing single parity of RAID 5 and the existing double parity of RAID 6. The only current implementation of RAID 7 is ZFS’ RAIDZ3. Because RAID 7 contains all of the overhead of both RAID 5 and RAID 6 plus the additional overhead of the third parity component we have a write penalty of a staggering eight times. So our formula for finding RAID 7 write performance is NX/8.

In our example this would mean that (8 * 125)/8 would come out to 125 Write IOPS or 1X. So with eight drives in our array we would get only the write performance of a single, stand alone drive. That is significant overhead. Our blended 50/50 IOPS would come out to only 562.5.

Complex RAID

Complex RAID levels or Nested RAID levels such as RAID 50, 60, 61, 16, etc. can be found using the information above and breaking the RAID down into its components and applying each using the formulæ provided above. There is no simple formula for these levels because they have varying configurations. It is necessary to break them down into their components and apply the formulæ multiple times.

RAID 60 with twelve drives, two sets of six drives, where each drive is 150 IOPS would be done with two RAID 6s. It would be the NX of RAID 0 where N is two (for two RAID 6 arrays) and the X is the resultant performance of each RAID 6. Each RAID 6 set would be (6 * 150)/6. So the full array would be 2((6 * 150)/6). Which results in 300 Write IOPS.

The same example as above but configured as RAID 61, a mirrored pair of RAID 6 arrays, would be the same performance per RAID 6 array, but applied to the RAID 1 formula which is NX/2 (where X is the resultant performance of the each RAID array.) So the final formula would be 2((6 * 150)/6)/2 coming to 150 Write IOPS from twelve drives.

Performance as a Factor of Capacity

When we are producing RAID performance formulæ we think of these in terms of the number of spindles which is incredibly sensible. This is very useful in determining the performance of a proposed array or even an existing one where measurement is not possible and allows us to compare the relative performance between different proposed options. It is in these terms that we universally think of RAID performance.

This is not always a good approach, however, because typically we look at RAID as a factor of capacity rather than of performance or spindle count. It would be very rare, but certainly possible, that someone would consider an eight drive RAID 6 array versus an eight drive RAID 10 array. Once in a while this will occur due to a chassis limitation or some other, similar reason. But typically RAID arrays are viewed from the standpoint of total array capacity (e.g. usable capacity) rather than spindle count, performance or any other factor. It is odd, therefore, that we should then switch to viewing RAID performance as a function of spindle count.

If we change our viewpoint and pivot upon capacity as the common factor, while still assuming that individual drive capacity and performance (X) remains constant between comparators then we arrive at a completely different landscape of performance. In doing this we see, for example, that RAID 0 is no longer the most performant RAID level and that read performance varies dramatically instead of being a constant.

Capacity is a fickle thing but we can distill it out to the number of spindles necessary to reach desired capacity. This makes this discussion far easier. So our first step is to determine our spindle count needed for raw capacity. If we need a capacity of 10TB and are using 1TB drives, we would need ten spindles, for example. Or if we need 3.2TB and are using 600GB drives we would need six spindles. We will, different than before, refer to our spindle count as R. As before, performance of the individual drive is represented as X. (R is used here to denote that this is the Raw Capacity Count, rather that the total Number of spindles.)

RAID 0 remains simple, performance is still RX as there are no additional drives. Both read and write IOPS are simply NX.

RAID 10 has RX Write IOPS but 2RX Read IOPS. This is dramatic. Suddenly when viewing performance as a factor of stable capacity we find that RAID 10 has double read performance over RAID 0!

RAID 5 gets slightly trickier. Write IOPS would be expressed as ((R + 1) * X)/4. The Read IOPS are expressed as ((R +1) * X).

RAID 6, as we expect, follows the pattern that RAID 5 projects. Write IOPS for RAID 6 are ((R + 2) * X)/6. And the Read IOPS are expressed as ((R + 2) * X).

RAID 7 falls right in line. RAID 7 Write IOPS would be ((R + 3) * X)/8. And the Read IOPS are ((R + 3) * X).

This vantage point changes the way that we think about performance and, when looking purely at read performance, RAID 0 becomes the slowest RAID level rather than the fastest and RAID 10 becomes the fastest for both read and write no matter what the values are for R and X!

If we take a real world example of 10 2TB drives to achieve 20TB of usable capacity with each drive having 100 IOPS of performance and assume a 50/50 blend, the resultant IOPS would be: RAID 0 with 1,000 Blended IOPS, RAID 10 with 1,500 Blended IOPS (2,000 RIOPS / 1,000 WIOPS), RAID 5 with 687.5 Blended IOPS (1,100 RIOPS / 275 WIOPS), RAID 6 with 700 Blended IOPS (1,200 RIOPS / 200 WIOPS) and finally RAID 7 with 731.25 Blended IOPS (1,300 RIOPS / 162.5 WIOPS.) RAID 10 is a dramatic winner here.

Latency and System Impact with Software RAID

As I have stated earlier, RAID 0 and RAID 10 have, effectively, no system overhead to consider. The mirroring operation requires essentially no computational effort and is, for all intents and purposes, immeasurably small. Parity RAID does have computational overhead and this results in latency at the storage layer and system resources being consumed. Of course, if we are using hardware RAID those resources are dedicated to the RAID array and have no function but to be consumed in this role. If we are using software RAID, however, these are general purpose system resources (primarily CPU) that are consumed for the purposes of the RAID array processing.

The impact to even a very small system with a large amount of RAID is still very small but it can be measured and should be considered, if only lightly. Latency and system impact are directly related to one another.

There is no simple way to state latency and system impact for different RAID levels except in this way: RAID 0 and RAID 10 have effectively no latency or impact, RAID 5 has some latency and impact, RAID 6 has roughly twice as much computational latency and impact as RAID 5 and RAID 7 has roughly triple the computational latency and impact as RAID 5.

In many cases this latency and system impact will be so small that they cannot be measured with standard system tools and as modern processors become increasingly powerful the latency and system impact will continue to diminish. Impact has been considered negligible for RAID 5 and RAID 6 systems on even low end, commodity hardware since approximately 2001. But it is possible on heavily loaded systems with a large amount of parity RAID activity that there could be contention between the RAID subsystem and other processes requiring system resources.

Reference: The IT Hollow – Understanding the RAID Penalty

Article originally posted to the StorageCraft Blog – RAID Performance.

Career, IT Management

It’s a Field, Not a Road

December 22, 2014 Scott Alan Miller 3 Comments

Over the years I have become aware of a tendency in the Information Technology arena to find strong expectations of exactly how much someone should know about certain technologies based on their job title and length of time having worked in IT. Of course, someone’s current job title and experience level should give you some, if only a little, insight into what they are doing on the job today, but it should rarely give you much insight into what they have done in the past or how they got to where they are today.

There are some abundantly common “paths” through IT, especially in the small and medium business markets, which help to stereotype the advancement of an IT professional over time. The most common path goes something like this: high school, four year college degree, one or two basic certifications from CompTIA, entry level helpdesk job, better help desk job, deskside support job, basic Microsoft certification, system administrator or IT manager position. This path is common enough that many people who have taken it simply assume that everyone else in the IT world has done so as well and this assumption creates a lot of problems in many different areas.

First of all, it must be stated, that there is no standard path in IT, not even remotely. Too often IT professionals, applying their own experiences to their view of other people, see IT as a road when it truly is a field (pun only partially intended.) IT has no defined entry point nor exit point from the industry. IT is a massive field made up of many different sub-disciplines that exist in little, if any, linear progression of any sort from one to another. There are far more lateral moves in IT than there are ladders to climb.

Besides the completely untrue assumption that any specific education and certification requirements exist in order to enter IT, the widely held belief that helpdesk positions are the only entry level IT position that exists and that they are only a stepping stone job are completely unfounded and untrue. Many, likely most, IT professionals do not enter the field through helpdesk, call centers, or even deskside support and probably not through any type of Windows-centric support at all. While end user focused, helpdesk remains only a small percentage of all IT careers and one through which only a portion of IT professionals will pass. Windows-centric support is one of the most important foci within IT and clearly the most visible to end users and those outside of IT; this high level of visibility can be misleading, however. It is equally true that helpdesk, call center, deskside support and the like are not stepping stone jobs exclusively and are career options in their own right. It is unfortunate that such a high percentage of IT professionals view such positions as being inappropriate career goals because it is widely recognized that a lack of skilled and dedicated people in those specific positions is often what causes the most friction between end users and IT departments.

I have found on several occasions hiring managers who discounted hiring anyone who was truly interested in helpdesk or deskside support as a career and who enjoyed working with customers; and only desired to hire someone who looked down on those roles as necessary evils that should be passed over as quickly as possible en route to a more “rewarding” career destination. I find this sad on many levels. It implies that the hiring manager lacks empathy for other professionals and does not consider their individual desires or strengths. It implies that the company in question is institutionalizing a system by which people are not hired to do something that they love nor something that they are good at but only hires people willing to do a job role that they don’t want to do in the hopes of eventually doing one that they do want to do. This rules out anyone actually qualified to do the desired job since those people will go straight into those positions. It almost guarantees, as well, that end user support will be poor as no one is hired that is specifically good at or interested in that role. The hiring manager clearly sees end user support as not being a priority and the entire idea is that anyone going into that role will “succeed” by moving out of that role as quickly as possible, and thus leaving end users with a lack of continuity as well as a never ending cycle of churn. Believing that IT is a road and not a field has tangible, negative consequences.

Seeing IT careers as a direct path from point A to point B creates an inappropriate set of expectations as well. It is not uncommon at all for someone to say that anyone with five years of experience in IT must know how to <insert somewhat common Windows desktop task here> based on nothing but their length of time working in IT, completely ignoring the possibility that they have never worked on Windows or in a role that would do that task. While Windows is common, many people working in IT have never performed those roles and there is no reason why it would be expected that a specific task like that would be known automatically. This goes beyond the already problematic attitude that many people have that the tasks that they personally did in a specific job role are the same tasks that everyone in that job role have done. This is, of course, completely untrue. A Windows system admin at one company and a Windows system admin at another company or even just in another department, may do similar tasks or possibly completely different tasks. Even a decade in those roles may produce almost completely unique experiences and skills. There is just so much potential in IT for doing different things that we cannot make specific task assumptions.

This assumptive process carries over to certifications and education as well. While many fields succumb to the cliche that anyone over a certain level must have a college education, it is far less common for that assumption to be true in IT. Few fields find university training to be as optional as IT does, and remembering that alternative means of entering the field exist is critical. Many of the best and brightest enter IT directly and not through an educational channel. These candidates are often years ahead of their “educated” counterparts and often represent the most passionate, driven and capable pool of talent; and they are almost certainly the most capable of self motivation and self education which are both extremely important traits in IT.

Similarly I was recently introduced, for the first time, to assumptions about certifications. Certifications are specific to job roles and none would apply broadly to all roles and none would be sensible for someone to hold if a higher certification was held or if they have never passed through that specific point in that specific job role. The example that came up was with a hiring manager who actually believe that anyone at ten years of experience would be expected to have both an A+ and a Network+ certification. Both are entry level certifications and not relevant to the vast majority of IT careers (the A+ especially has little broad applicability while the Network+ is much more general case but still effectively entry level.) While it would not be surprising to find these being held by a ten year IT veteran, it would make no sense whatsoever to be used as filtering agents by which someone would rule out candidates for lacking. This is completely ridiculous. Those certs are designed only to show rudimentary knowledge in specific IT career paths. Anyone who has passed that point in their career without needing them would never go back and spend time and money earning entry level certifications while already at a career mid-point. Once you have a PhD, you don’t go back and get another Associates degree just to show that you could have done it, the PhD is enough to demonstrate the ability to get an entry level degree. And most people with a significant history in the field will often have passed the career point where those certs made sense years before the certs even existed (the Network+, for example, did not exist until I was in IT for more than a decade already!)

I am particular sensitive to this issue both because I spent several years as a career counselor and helped to put IT professionals on a path to career growth and development and because I myself did not take what is considered to be a conventional path into IT. I was lucky enough to have interned in software development during the middle and high school years and was offered a position in UNIX support right out of high school. I never passed through any Windows-centric roles, nor did I ever work on a helpdesk or do deskside support outside of a small amount of UNIX high end research labs. My career took me in many different directions but almost none followed the paths that so many hiring managers expect. Attempting to predict the path that one’s career will take in the future is impossible. Equally, attempting to determine what path must have been taken to have reached a current location is also impossible. There are simply too many ways to get from point A to point B.

Embracing uniqueness in IT is important. We all bring different strengths and weaknesses, different ideas and priorities, different goals and different things that we enjoy or despise doing. The job that one person sees as a necessary evil another will love doing and that passion for the role will show. The passionate, career-focused helpdesk professional will bring an entirely different joie de vivre to the job than will someone who feels that they are trapped doing an undesirable job until another opportunity comes along. This doesn’t mean that the later will not work hard and try their best, but there is little that can be done to compete with someone passionate about a specific role.

It is also very easy, when we look at IT as a singular path, to forget that individual roles, such as helpdesk, actually have progressions within the role itself. Often many steps exist within specific roles. In the case of a helpdesk it is common to refer to these as L0 through L3. Plus there are helpdesk team leads and helpdesk manager positions that are common. An entire career can be had just within the helpdesk focus sub-discipline within IT. There is nothing wrong with entering IT directly into the role type that interests you. There is also nothing wrong with achieving a place in your career where you are happy to stay. Everyone has an ideal position, a career position where they both excel at what they do and are happy doing indefinitely. In most fields, people actually strive to achieve this type of position somewhat early in their careers. In IT, it is strangely uncommon.

There is a large amount of social pressure within IT to have “ambition” pushing you towards more and more challenging positions within the field. Partially this is because IT is such an enormous field that is so dynamic that most people really do enter wherever opportunity presents itself and then attempt to maneuver themselves into positions that they find interesting over a period of many years. This creates a culture of continuous change and advancement expectations, to some degree. This is not entirely bad but it often marginalizes or even penalizes people who manage to find their desired positions, especially if this happens early in their careers and even more specifically if it happens in a role which many people see as a “stepping stone” role such as with helpdesk or deskside support. This is not good for individuals, for businesses or for the field in general. It pushes people into roles where they are not happy and not well suited in order to satisfy social pressures rather than career aspirations or business needs.

Ambition is not necessarily a good thing. It certainly is not a bad thing. But too often hiring managers look for ambition when it is not in anyone’s interest. Hiring someone young or inexperienced in the hopes that they grow over time and move into more and more advanced roles is an admirable goal and can work out great. But avoiding hiring someone perfectly suited for a role because they will want to stay where they are well suited and where they excel makes no sense at all. In an ideal world, everyone would be hired directly into the perfect position for them and no one would ever need to change jobs. This is best for both the employees and the employer. It is rarely possible, but certainly should not be avoided when the opportunity presents itself.

Creating stereotypes and using them to judge IT professionals has negative consequences for everyone. It increases stress, reduces career satisfaction, decreases performance and lowers the quality of IT service delivery while making it more expensive to provide. It is imperative that we accept IT as a field, not as a road, and that we also accept that IT professionals are individuals with different goals, different career motivations and different ambitions. Variety and diversity in IT are far more important than they are in most fields because IT is so large and requires so many different perspectives to perform optimally. Unlike a road that travels a single, predictable path, a field allows you to wander in many directions and arrive at many different destinations.

Business of IT, Career, Education, IT Hiring

The Home Line

November 20, 2014 Scott Alan Miller Leave a comment

In many years of working with the small and medium business markets I have noticed that the majority of SMB IT shops tend to one of two extremes: massive overspend with an attempt to operate like huge companies by adopting costly and pointless technologies unnecessary at the SMB scale or they go to the opposite extreme spending nothing and running technology that is completely inadequate for their needs. Of course the best answer is somewhere in between – finding the right technologies, the right investments for the business at hand; and some companies manage to work in that space but far too many go to one of the two extremes.

A tool that I have learned to use over the years is classifying the behavior of a business against decision making that I would use in a residential setting – specifically my own home. To be sure, I run my home more like a business than does the average IT professional, but I think that it still makes a very important point. As an IT professional, I understand the value of the technologies that I deploy, I understand where investing time and effort will pay off, and I understand the long term costs of different options. So where I make judgement calls at home is very telling. My home does not have the financial value of a functional business nor does it have the security concerns, nor the need to scale (my family will never grow in user base size, no matter how financial successful it is) so when comparing my home to a business, my home should, in theory, set the absolute lowest possible bar in regards to financial benefit of technology investment. That is to say, that the weighing of options for an actual, functional business should always lean towards equal or more investment in performance, safety, reliability and ease of management than my home. My home should be no more “enterprise” or “business class” than any real business.

One could argue, of course, that I make poor financial decisions in my home and over-invest there for myriad reasons and, of course, there is merit to that concern. But realistically there are broad standards that IT professionals mostly agree upon as good guidelines and while many do not follow these at home, either through a need to cut costs, a lack of IT needs at home or, as is often the case, a lack of buy in from critical stakeholders (e.g. a spouse), most agree as to which ones make sense, when they make sense and why. The general guideline as to what technology at which price points set the absolute minimum bar are by and large accepted and constitute what I refer to as the “home line.” The line, below which, a business cannot argue that it is acting like a business but is, at best, acting like a consumer, hobbyist or worse. A true business should never fall below the home line, doing so would mean that they consider the value of their information technology investment in their business to be lower than what I consider my investment at home to be.

This adds a further complication. At home there is little cost to the implementation of technologies. But in a business all of the time spent working on technology, and supporting less than ideal decisions, is costly. Either costly in direct dollars spent, often because IT support is being provided by a third party doing so on a contractual basis, or costly because time and effort are being expended on basic technology support that could be being used elsewhere – the cost of lost opportunity. Neither of these take into account things like the cost of downtime, data loss or data breach which are generally the more significant costs that we have to consider.

The cost of the IT support involved is a significant factor. For a business, there should be a powerful leaning towards technologies that are robust and reliable with a lower total cost of ownership or a clear return on investment. In a home there is more reason to spend more time tweaking products to get them to work, working with products that fail often or require lots of manual support, using products that lack powerful remote management options or products that lack centralized controls for user and system management.

It is also important to look at the IT expenditures of any business and ask if the IT support is thus warranted in the light of those investments. If a business is unwilling to invest into the IT infrastructure an equivalent amount that I would invest into the same infrastructure for home use, why would a business be willing to maintain an IT staff, at great expense, to maintain that infrastructure? This is a strange expenditure mismatch but one that commonly arises. A business which has little need of full time IT support will often readily hire a full time IT employee but be unwilling to invest in the technology infrastructure that said employee is intended to support. There seems to be a correlation between businesses that underspend on infrastructure with those that overspend on support – however a simple reason for that could be that staff in that situation is the most vocal. Businesses with adequate staff and investment have little reason for staff to complain and those with no staff have no one to do the complaining.

For businesses making these kinds of tradeoffs, with only the rarest of exceptions, it would make far better financial and business sense to not have full time IT support in house and instead move to occasional outside assistance or a managed services agreement at a fraction of the cost of a full time person and invest a portion of the difference into the actual infrastructure. This should provide far more IT functionality for less money and at lower risk.

I find that the home line is an all around handy tool. Just a rough gauge for explaining to business people where their decisions fall in relation to other businesses or, in this case, non-businesses. It is easy to say that someone is “not running their business like a business” but this adds weight and clarity to that sentiment. That a business is not investing like another business up the street may not matter at all. But if they are not putting as much into their business as the person that they are asking for advice puts into their home, that has a tendency to get their attention. Even if, at this point, the decisions to improve the business infrastructure become primarily driven by emotion, the outcome can be very positive.

Comparing one business to another can result in simple excuses like “they are not as thrifty” or “that is a larger business” or “that is a kind of business that needs more computers.” It is rarely useful for business people or IT people to do that kind of comparison. But comparing to a single user or single family at home there is a much more corporeal comparison. Owners and managers tend to take a certain pride in their businesses and having it be widely seen that they see their own company’s value as lower than that of a single household is non-trivial. Most owners or CEOs would be ashamed if their own technology needs did not exceed those of an individual IT professional let alone theirs plus all of the needs of the entire business that they oversee. Few people want to think of their entire company as being less than the business value of an individual.

This all, of course, brings up the obvious questions of what are some of the things that I use at home on my network? I will provide some quick examples.

I do not use ISP supplied networking equipment, for many reasons. I use a business class router and firewall unit that does not have integrated wireless nor a switch. I have a separate switch to handle the physical cabling plant of the house. I use a dedicated, managed, wireless access point. I have CAT5e or CAT6 professionally wired into the walls of the house so that wireless is only used when needed, not as a default for more robust and reliable networking (most rooms have many network drops for flexibility and to support multimedia systems.) I use a centrally managed anti-virus solution, I monitor my patch management and I never run under an administrator level account. I have a business class NAS device with large capacity drives and RAID for storing media and backups in the house. I have a backup service. I use enterprise class cloud storage and applications. My operating systems are all completely up to date. I use large, moderate quality monitors and have a minimum of two per desktop. I use desktops for stationary work and laptops for mobile work. I have remote access solutions for every machine so that I can access anything from anywhere at any time. I have all of my equipment on UPS. I have even been known to rackmount the equipment in the house to keep things neater and easier to manage. All of the cables in the attic are carefully strung on J-hooks to keep them neat. I have VoIP telephony with extensions for different family members. All of my computers are commercial grade, not consumer.

My home is more than just my residential network, it is an example of how easy and practical it is to do infrastructure well, even on a small scale. It pays for itself in reliability and often the cost of the components that I use are far less than that of the consumer equipment often used by small businesses because I research more carefully what I purchase rather than buying whatever strikes my fancy in the moment at a consumer electronics store. It is not uncommon for me to spend half as much for quality equipment as many small businesses spend for consumer grade equipment.

Look at the businesses that you support or even, in fact, your own business. Are you keeping ahead of the “home line?” Are you setting the bar for the quality of your business infrastructure high enough?

Originally published on the StorageCraft Blog.

SMB IT Journal

On DevOps and Snowflakes

Practical RAID Performance

It’s a Field, Not a Road

The Home Line

The Information Technology Resource for Small Business