All posts by Scott Alan Miller

Started in software development with Eastman Kodak in 1989 as an intern in database development (making database platforms themselves.) Began transitioning to IT in 1994 with my first mixed role in system administration.

The High Cost of On Premises Infrastructure

IT Infrastructure is a challenge for any company and especially companies that are not large enough to implement their own, full scale datacenters.  Like many things in IT, major challenges come in the form of lacking specific, seldom used expertise as well as lacking the scale to utilize singular resources effectively.

This lack of scale can come in many forms.  The obvious one is in man power.  Managing a physical computing infrastructure uses unique skills that are separate from IT itself and are often desired to be available “around the clock.”  This can vary from security to electrical to cooling and facilities to “datacenter technician” style staff.  Of course, smaller businesses simply do without these roles available to them, but this raises the cost incurred on a “per server” basis to maintain the infrastructure.  Large businesses and dedicated datacenters leverage an efficiency of scale to make the cost of physically housing an IT infrastructure lower – either by actually lowering the cost directly or by raising the quality and reliability of the equipment.

The cost effectiveness of delivering power, cooling and datacenter services is only one aspect of the cost of IT infrastructure in a business.  Where many businesses attack this problem, by reducing infrastructure investment and staff, may counteract some amount of the up front costs of the infrastructure, but generally does so to the detriment of availability and longevity of equipment.  Whether it is a lack of ISP redundancy, an absence of diesel electric generators or shaving a year or two of service off of a server’s service life, these costs generally add up, often in ways that are difficult to identify and track.

We see the effects of low qualify infrastructure often come out in the behaviour and expectations of smaller businesses.  For example in the enterprise datacenter an average server lifespan may be ten years or more, but smaller businesses often assume that a server is worn out and unreliable in seven or eight years.  This increase in failure rate also leads to more concern about system failure.  Smaller businesses often see a higher, rather than a lower, need to have redundant systems even when lower revenue would normally suggest otherwise.  Small businesses are prone to investing heavily in high availability mechanisms, often at great expense, to mitigate a perceived risk of high system fail rates that larger businesses may be less likely to see.  These factors can combine to create a high cost through more rapid system replacement and a tendency towards overbuying hardware – sometimes even doubling the otherwise necessary investment to protect against risks created by lower quality facilities management.

This concept is not unique to information infrastructure.  In the audiophile world, while huge investments in high quality audio equipment is common, it is a rule of thumb that fifty percent of audio quality comes from the equipment and fifty percent comes from the environment into which it is placed.  This lesson applies to information infrastructure.   Lower cost gear may run longer and more reliably in a high quality physical environment than more expensive, better engineered equipment will in a lower quality one.

Of course the most obvious components of lower reliability come from being unable to maintain redundant generators, independent power rails, adequate fuel supplies, uninterrupted power supply units, steady temperature and humidity, air filtration and, of course, highly redundant multi-path WAN access.  These aspects we think of all the time and are almost completely out of reach of all but the largest companies.   Even simple things like restricting access to only essential server room staff can be an insurmountable challenge in a small environment.

These challenges create an opportunity to find alternatives for the SME, SMB and SOHO business markets to look for ways to leverage combined scale.  While many companies today turn to ideas such as hosted cloud computing, the associated costs to elastically expanding capacity often make this impractical as this same market struggles the most to have the ability to utilize that type of functionality.  Cloud computing can be an answer in some cases, but normally only for the very smallest of companies for whom a single server is too much scale, or for those companies so large that they have a DevOps-style automation infrastructure capable of scaling elastically with load demands and workloads that make sense for this process.  But these companies are the exception, not the norm.  More often hosted cloud computing makes sense for only a specific subset of public-facing workloads and only in some cases.

For the majority of companies too small to create the scale necessary to build out their own full scale IT infrastructure, the answer is likely going to be found in colocation.  It must be noted that there are obviously potentially overarching locational or environmental factors that can make off-premises infrastructures impossible or at least impractical.  Most businesses, however, will not be subject to these limitations.

Colocation tackles the cost challenges of the smaller business market by generating the scale necessary to make high quality, dedicated information infrastructure facilities possible.  This includes staff, WAN connectivity, environmental controls, power, and expertise.  Cost savings can often come from surprising places including lower power cost per kilowatt hour, lower cost of cooling and power conditioning and higher real estate density.

It is often believed that colocation represents a cost premium service for businesses that have needs above and beyond the average, but in reality colocation is often and should often be chosen because it represents an opportunity to lower costs while also improving reliability.  Colocation, in most cases, will actually bring a cost savings on a month by month basis providing for an impressive return on investment potential over time as the initial cost can be equal or similar to other investments, but the ongoing monthly cost can be lower and, perhaps more importantly, the costs can become far more predictable with fewer risks and unexpected expenditures.

Because the cost of services are potentially very granular it is actually far easier for colocation lower the overall expenditure than is generally believed.  For example, a small business with just one or two servers would still need certain basics such as air conditioning and UPS support plus footprint space and security; all dedicated for only a very small amount of equipment.  In a colocation facility these servers may represent less than one percent of the cooling of a large, high efficiency cooling system, may use just a small fraction of a large UPS and so forth.

Colocation also frees IT staff from performing datacenter functions, at which they are generally untrained and poorly qualified, to focus on the tasks at which they are more valuable and intentioned.  Then the datacenter tasks can be performed by experienced, dedicated datacenter staff.

Calculating exactly ROI can be challenging because individual cases are very unique and depend heavily on the workloads, use cases, independent needs and environmental factors of an individual business and the colocation options considered.  But it should be approached with a mindset that colocation does not present only an opportunity for improvements in the quality or reliability of IT infrastructure services, not that it can represent a return on investment but that it may, in fact, do both of these things on top of fundamentally lowering costs overall.

All IT is External

In IT we often talk about internal and external IT, but this perspective is always one from that of the IT department itself rather than the one from the business and I feel that this is very misleading.   Different departments within a company are generally seen and feel as if they are external to one another; often every bit as much as an external company feels.  For example, an IT department will often see management, operations or human resources as “foreign” departments at best and adversaries at worst.  It is common to feel, and possibly rightfully so, that different departments fail to even share common overarching goals.  IT tends to be acutely aware of that and expresses it often.

What we need to appreciate is that to the business management or owners, the IT department generally appears to them like an external agencies regardless of whether the people working in it are staff or actually from a service provider.  There are exceptions to this, of course, but they are rare.  IT is generally held behind a barrier of sorts and is its own entity. IT commonly displays this in how it talks to or about management.  IT often thinks of system resources or the network as “belonging to IT”, clearly not thinking in terms of IT being just part of the company.  Both sides are commonly guilty of thinking of IT as a separate entity from the company itself.

This happens, of course, for any number of reasons.  Many IT workers choose IT because they are passionate about IT specifically, not the company or market that they are working in; their loyalty is to their IT career, not the business in question and would generally switch companies to advance their IT career rather than stay to advance their internal non-IT career.  IT professionals often struggle with interpersonal skills and so have a higher than average tendency to hide away avoiding unnecessary contact with other departments.  IT tends to be busy and overworked, making socializing problematic.  IT work demands focus and availability, again making it difficult to socialize and interface with other departments.  IT is often kept isolated for security reasons and IT is often seen as the naysayer of the organization – commonly delivering bad news or hindering projects.  IT typically has extremely high turnover rates and almost no IT staff, especially in smaller businesses, is expected to be around for the long haul.  IT is often a conduit to outside vendors and is seen as connected to them or associated with them in many ways.  IT is often behind a “blame barrier” where the organization (other than IT) on one side often seeks to blame IT for business decisions creating a stronger “us and them” mentality. IT exacerbates this with attitudes towards users and decisions makers that are often distancing.  It is also extremely common for IT workers to be staffed via an agency in such a way that there are contract obligations, restrictions or payrolls differences between IT and normal staff.

This creates a rather difficult situation for discussions involving the advantages of internal IT versus external IT.  For internal IT staff it is common to believe that by having IT internally that there are many benefits to the organization due to loyalty, closeness or the ties of payroll.  But is this really the case?

To the business, internal IT is already, in most cases, external to their organization.  The fears that are often stated about external IT service provides such that they may not work in the business’ interests, may suddenly close up shop and disappear, might be overworked and not have enough available resources, may charge for work when idle, may not have the needed expertise, may see the network and resources as their own and not act in the interests of the business, may fail to document the systems or might even hold critical access hostage for some reason – are all fears that businesses have about their own IT departments exactly the same as they have them with external IT service providers.

In fact, external service providers often provide a business with more legal recourse than employees do.  For example, internal IT employees can quit with zero notice and only suffer from acting “unprofessionally” in their lack of notice or can give only two weeks notice and not even have to worry about being unprofessional.  Yet replacing internal IT staff of any caliber will easily take months, and that is just before one can be hired let alone trained, indoctrinated and brought up to useful speed.  It is not uncommon, even in the enterprise, for a job search, hiring process and internal processes for access and so forth to take up to a year from the time the decision to begin interviewing has started until someone is a useful staff member.  But an external IT service provider may be obligated to provide resources for coverage regardless of if staff comes and goes.  There are far more possibilities for mitigating the staff turnover risks that employed IT staff present to a business.

Due to these factors, it is very common for a business to perceive both internal and external IT resources as roughly equal and primarily such that both are very much outsiders to the key organization.  Of course, in an ideal world, both would be treated very much as insiders and worked with as critical partners for planning, decision making, triage and so forth.  IT is critical to business thinking and the business is critical to IT thinking; neither is really functional without the other.

This context of the organizational management view of IT can be important for understanding how the business will react to IT as well as how IT should behave with management.  And it offers an opportunity for both to work on coming together, whether IT is ultimately internal or external, to behave more like a singular organization with a unified goal.

Titanic Project Management & Comparison with Software Projects

Few projects have ever taken on the fame and notoriety of that achieved by the Titanic and her sister Olympic ships, the Olympic and the Britannic, which began design one hundred and ten years ago this year.  There are, of course, many lessons that we can learn from the fate of the Olympic ships in regards to project management and, in fact, there are many aspects of project management that are worth covering.

(When referring to the ships as a whole I will simply reference them as The Olympics as the three together were White Star Line’s Olympic Class ships.  Titanic’s individual and latter fame is irrelevant here.  Also, I am taking the position here that the general information pertaining to the Olympic ships, their history and fate are common knowledge to the reader and will not cover them again.)

Given the frequency with which the project management of the Olympics has been covered, I think that it is more prudent to look at a few modern parallels where we can view current project management in today’s world through a valuable historic lens.  It is very much the case that project management is a discipline that has endured for millennia and many of the challenges, skills and techniques have not changed so much and the pitfalls of the past still very much apply to us today.  The old adage applies, if we don’t learn from the past we are doomed to repeat it.

My goal here, then, is to examine the risk analysis, perception and profile of the project and apply that to modern project management.

First, we must identify the stakeholders in the Olympics project. White Star Lines itself (sponsoring company and primary investor) and its director Joseph Bruce Ismay, Harland-Wolff (contracted ship builder) with its principle designers Alexander Carlisle and Thomas Andrews, the ships’ crew which includes Captain Edward John Smith, the British government as we will see later and, most importantly, the passengers.

As with any group of stakeholders there are different roles that are played.  White Star on one side is the sponsor and investor and in a modern software project would be analogous to a sponsoring customer, manager or department.  Harland-Wolff were the designers and builders and were most closely related to software engineering “team members” in a modern software team, the developers themselves.  The crew of the ships were responsible for operations after the project was completed and would be comparable to an IT operations team taking over the running of the final software after completion.  The passengers were much as end users today, hoping to benefit from both the engineering deliverable (ship or software) and the service build on top of that product (ferry service or IT managed services.) (“Olympic”)

Another axis of analysis of the project is that of chicken and pig stakeholders where chickens are invested and carry risk while pigs are fully invested and carry ultimate risk.  In normal software we use these comparatives to talk about degrees of stakeholders – those which are involved versus those that are committed, but in the case of the Olympic ships these terms take on new and horrific meaning as the crew and passengers literally put their lives on the line in the operational phase of the ships, whereas the investors and builders were only financially at risk. (Schwaber)

Second, I believe that it is useful to distinguish between different projects that exist within the context of the Olympics.  There was, of course, the design and construction of the three ships physically.  This is a single project, with two clear components – one of design and one of construction.  And three discrete deliverables, namely the three Olympic vessels.  There is, at the end of the construction phase, an extremely clear delineation point where the project managers and teams involved in the assembly of the ship would stop work and the crew that operated the ship would take over.

Here we can already draw an important analogue to the modern world of technology where software products are designed and developed by software engineers and, when they are complete, are handed over to the IT operational staff who take over the actual intended use of the final product.  These two teams may be internal under a single organizational umbrella or from two, or more, very separate organizations.  But the separation between the engineering and the operational departments has remained just as clear and distinct in most businesses today as it was for ship building and ferry service over a century ago.

We can go a step farther and compare White Star’s transatlantic ferry service to many modern software as a service vendors such as Microsoft Office 365, Salesforce or G Suite.  In these cases the company in question has an engineering or product development team that creates the core product and then a second team that takes that in-house product and operates it as a service.  This is increasingly an important business model in the software development space that the same company creating the software will be the ultimate operator of it, but for external clients.  In many ways the relevance of the Olympics to modern software and IT is increasing rather than decreasing.

This brings up an important interface understanding that was missed on the Olympics and is often missed today: each side of the hand-off believed that the other side was ultimately responsible for safety.  The engineers touted their safety of design, but when pushed were willing to compromise assuming that operational procedures would mitigate the risks and that their own efforts were largely redundant.  Likewise, when pushed to keep things moving and make good time the operations team were willing to compromise on procedures because they believed that the engineering team had gone so far as to make their efforts essentially wasted, the ship being so safe that operational precautions just were not warranted.  This miscommunication took the endeavor from having two types of systems of extreme safety down to basically none.  Had either side understood how the other would or did operate, they could have taken that into account.  In the end, both sides assumed, at least to some degree, that safety was the “other team’s job”.  While the ship was advertised heavily based on safety, the reality was that it continued the general trend of the past half century plus, where each year ships were made and operated less safely than the year before. (Brander 1995)

Today we see this same problem arising between IT and software engineering – less around stability (although that certainly remains true) but now about security, which can be viewed similarly to safety in the Olympics’ context.  Security has become one of the most important topics of the last decade on both sides of the technology fence and the industry faces the challenges created by the need for both sides to action security practices thoroughly – neither is capable of truly implementing secure systems alone. Planning for safety or security is simply not a substitute for enforcing it procedurally during operations.

An excellent comparison today is British Airways and how they approach every flight that they oversee as it crosses the Atlantic.  As the primary carrier of air traffic over the North Atlantic, the same path that the Olympics were intended to traverse, British Airways has to maintain a reputation for excellence in safety.  Even in 2017, flying over the North Atlantic is a precarious and complicated journey.

Before any British Airways flight takes off, the pilots and crew must review a three hundred page mission manual that tells them everything that is going on including details on the plane, crew, weather and so forth.  This process is so intense that British Airways refuses to even acknowledge that it is a flight, but officially refers to every single trip over the Atlantic as a “mission”; specifically to drive home to everyone involved the severity and risk involved in such an endeavor.  They clearly understand the importance of changing how people think about a trip such as this and are aware of what can happen should people begin to assume that everyone else will have done their job well and that they can cut corners on their own job.  They want no one to become careless or begin to feel that the flight, even though completed several times each day, is ever routine. (Winchester)

Had the British Airways approach been used with the Titanic, it is very likely that disaster would not have struck when it did.  The operational side alone could have prevented the disaster.  Likewise, had the ship engineers been held to the same standards as Boeing or AirBus today they likely would not have been so easily pressured by management to modify the safety requirements as they worked on the project.

What really affected the Olympics, in many ways, was a form of unchecked scope creep.  The project began as a traditional waterfall approach with “big design up front” and the initial requirements were good with safety playing a critical role.  Had the original project requirements and even much of the original design been used, the ships would have been far safer than they were.  But new requirements for larger dining rooms or more luxurious appointments took precedence and the scope and parameters of the project were changed to accommodate these new changes.  As with any project, no change happens in a vacuum but will have ramifications for other factors such as cost, safety or delivery date. (Sadur)

The scope creep on the Titanic specifically was dramatic, but hidden and not necessarily obvious for the most part.  It is easy to point out small changes such as a shift of dining room size, but of much greater importance was the change in the time frame in which the ship had to be delivered.  What really altered the scope was actually that initial deadlines and projects had to be maintained, relatively strictly.  This was specifically problematic because in the midst of Titanic’s dry dock work and later moored work, the older sibling, Olympic, was brought in for extensive repairs multiple times which had a very large impact on the amount of time in the original schedule available for Titanic’s own work to be completed.  This type of scope modification is very easy to overlook or ignore, especially in hindsight, as the physical deliverables and the original dates did not change in any dramatic way.  For all intents and purposes, however, Titanic was rushed through production much faster than had been originally planned.

In modern software engineering it is well accepted that no one can estimate the amount of time that a design task will take as well as the engineer(s) that will be doing the task themselves.  It is also generally accepted that there is no means of significantly speeding up engineering and design efforts through management pressure. Once a project is running at maximum speed, it is not going to go faster.  Attempts to go faster will often lead to mistakes, oversights or misses.  We know this to be true in software and can assume that it must have been true for ship design as well as the principles are the same.  Had the Titanic been given the appropriate amount of time for this process, it is possible that safety measures would have been more thoroughly considered or at least properly communicated to the operational team at hand off.  Teams that are rushed are forced to compromise and since time cannot be adjusted as it is the constraint, the corners have to be cut somewhere else and, almost always that comes from quality and thoroughness.  This might manifest itself as a mistake or perhaps as failing to fully review all of the factors involved when changing one portion of a design.

This brings us to holistic design thinking. At the beginning of the project the Olympics were designed with safety in mind: safety that results from the careful inter-workings of many separate systems that together are intended to make for a highly reliable ship.  We cannot look at the components of a ship of this magnitude individually, they make no sense – the design of the hull, the style of the decks, the weight of the cargo, the materials used, the style of the bulkheads are all interrelated and must function together.

When the project was pushed to complete more quickly or to change parameters this holistic thinking and a clear revisiting of earlier decisions was not done or not done adequately.  Rather, individual components were altered irrespective of how that would impact their role without the whole of the ship and the resulting impact to overall safety.  What may have seemed like a minor change had unintended consequences that were unforeseen because holistic project management was abandoned.  (Kozak-Holland)

This change to the engineering was mirrored, of course, in operations.  Each change, such as not using binoculars or not taking ice bucket readings, were individually somewhat minor, but taken together they were incredibly impactful.  Likely, but we cannot be sure, a cohesive project management or, at least, process improvement system was not being used.  Who was overseeing that binoculars were used, that the water tests were accurate and so forth?  Any check at all would have revealed that the tools needed for those tasks did not exist, at all.  There is no way that so much as a simple test run of the procedures could have been performed, let alone regular checking and process improvement.  Process improvement is especially highlighted by the fact that Captain Smith had had practice on the RMS Olympic, caused an at-sea collision on her fifth voyage and then nearly repeated the same mistake with the initial launch of the Titanic.  What should have been an important lesson learned by all captains and pilots of the Olympic ships instead was ignored and repeated, almost immediately. (“Olympic”)

Of course ship building and software are very different things, but many lessons can be shared.  One of the most important lessons is to see the limitations faced by ship building and to recognize when we are not forced to retain these same limitations when working with software.  The Olympic and Titanic were built nearly at the same time with absolutely no time for engineering knowledge gleaned from the Olympic’s construction, let alone her operation, to get to be applied to the Titanic’s construction.  In modern software we would never expect such a constraint and would be able to test software, at least to some small degree, before moving on to additional software that is based upon it either in real code or even conceptually.  Project management today needs to leverage the differences that exist both in more modern times and in our different industry to the best of its advantage.  Some software projects still do require processes like this but these have become more and more rare over time and today are dramatically less common than they were just twenty years ago.

It is well worth evaluating the work that was done by Harland-Wolff with the Olympics as they strove very evidently to incorporate what feedback loops were possible within their purview at the time.  Not only did they attempt to use the construction of earlier ships to learn more for the later ones, although this was very limited as the ships were mostly under construction concurrently and most lessons would not have had time to have been applied, but far more importantly they took the extraordinary step of having a “guarantee group” sail with the ships.  This guarantee group consisted of all manner of apprentice and master ship builders from all manner of support trades.  (“Guarantee Group”)

The use of the guarantee group for direct feedback was, and truly remains, unprecedented and was an enormous investment in hard cost and time for the ship builders to sacrifice so many valuable workers to sale in luxury back and forth across the Atlantic.  The group was able to inspect their work first hand, see it in action, gain an understanding of its use within the context of the working ship, work together on team building, knowledge transfers and more.  This was far more valuable than the feedback from the ship yards where the ships were overlapping in construction, this was a strong investment in the future of their ship building enterprise: a commitment to industrial education that would likely have benefited them for decades.

Modern deployment styles, tools and education have led from the vast majority of software being created under a Waterfall methodology not so distinct from that used in turn of the [last] century shipbuilding, to most leveraging some degree of Agile methodologies allowing for rapid testing, evaluation, changes and deployment.  Scope creep has changed from something that has to be mitigated or heavily managed to something that can be treated as expected and assumed within the development process even to the point of almost being leveraged.  One of the fundamental problems with big design up front is that it always requires the customer or customer-role stakeholder to make “big decisions up front” which are often far harder for them to make than the design is for the engineers.  These early decisions are often a primary contributor to scope creep or to later change requests and can often be reduced or avoided by agile processes that expect continuous change to occur to requirements and build that into the process.

The shipbuilders, Harlan and Wolff, did build a fifteen foot model of the Olympic for testing which is useful to some degree, but of course failed to mimic the hydrological action that the full size ship would later produce and failed to predict some of the more dangerous side effects of the new vessel’s size when close to other ships which led to the first accident of the group and to what was nearly a second.  The builders do appear to have made every effort to test and learn at every stage available to them throughout the design and construction process. (Kozak-Holland)

In comparison to modern project management this would be comparable to producing a rapid mock-up or wireframe for developers or even customers to get hands-on experience with before investing further effort into what might be a dead end path for unforeseen reasons.  This is especially important in user interface design where there is often little ability to properly predict usability or satisfaction ratings without providing a chance for actual users to physically manipulate the system and judge for themselves if it provides the experience for which they are looking. (Esposito)

We must, of course, consider the risk that the Olympics undertook within the context of their historical juxtaposition in regards to financial trends and forces.  At the time, starting from the middle of the previous century, the prevailing financial thinking was that it was best to lean towards the risky, rather than towards the safe – in terms of loss of life, cargo or ships; and to overcome the difference via insurance vehicles.  It was simply too financially advantageous for the ships to operate in a risky manner than to be overly cautious about human life. This trend, by the time of the Olympics, had been well established for nearly sixty years and would not begin to change until the heavy publicity of the Titanic sinking.  The market impact to the public did not exist until the “unsinkable” ship, with so many souls aboard, was lost in such a spectacular way.

This approach to risk and its financial trade offs is one that project managers must understand today the same as they did over one hundred years ago.  It is easy to be caught believing that risk is so important that it is worth any cost to eliminate, but projects cannot think this way.  It is possible to expend unlimited resources in the pursuit of risk reduction.  In the real world it is necessary that we balance risks with the cost of risk mitigation.  A great example of this in modern times, but outside that of software development specifically, is in the handling of credit card fraud in the United States.  Until just the past few years, it has generally been the opinion of the US credit card industry that the cost of greater security measures on credit cards to prevent theft were too high compared to the risks of not having them; essentially it has been more cost effective to spend money in reimbursing fake transactions than it was to prevent those fake transactions. This cost to risk ratio can sometimes be counterintuitive and even frustrating, but is one that has to drive project decisions in a logical, calculated fashion.

In a similar vein, it is common in IT to design systems believing that downtime is an essentially unlimited cost and to spend vastly more attempting to mitigate a downtime risk than the cost of the actual outage event itself would likely be if it were to occur.  This is obviously foolish, but so rarely are cost analysis of this type run or run correctly it becomes far too easy to fall prey to this mentality.  In software engineering projects we must approach risks in a similar fashion.  Accepting that there is risk, of any sort, and determining the actual risk, the magnitude of the impact of that risk and comparing that against the cost of mitigation strategies is critical to making an appropriate project management decision in regards to the risk. (Brander 1995)

Also of particular interest to extremely large projects, of which the Olympics certainly qualified, there is an additional concept of being “too big to fail.”  This, of course, is a modern phrase that came about during the financial crisis of the past decade, but the concept and the reality of this is far older and a valuable consideration to any project that falls onto a scale that would register a “national financial disaster” should the project totally falter.  In the case of the Olympics the British government ultimately insulated the investors from total disaster as the collapse of one of the largest passenger lines would have been devastating to the country at the time.

White Star Lines was simply “too big to fail” and was kept afloat, so to speak, by the government before being forcibly merged into Cunard some years later.  This concept, knowing that the government would not want to accept the risks of the company failing, may have been calculated or considered at the time, we do not know.  We do know, however, that this is taken into consideration today with very large projects.  An example of this happening currently is that of Lockheed Martin’s F-35 fighter which is dramatically over budget, past its delivery date and no longer even considered likely to be useful has been buoyed for years, but different government sponsors who see the project as too important, even in a state of failure to deliver, for the national economy to allow the project to fully collapse.  As this phenomenon becomes better and better known, it is likely that we will see more projects take this into consideration in their risk analysis phases. (Ellis)

Jumping to the operational side of the equation we could examine any number of aspects that went wrong leading to the sinking of the Titanic, but at the core I believe that what was most evident was a lack of standard operating procedures throughout the process.  This is understandable to some degree as the ship was on its maiden voyage and there was little time for process documentation and improvement.  However this was the flagship of a long standing shipping line that had a reputation to uphold and a great deal of experience in these matters.  It would also overlook that by the time that Titanic was attempting its first voyage that the Olympic had already been in service far more than enough to have developed a satisfactory set of standard operating procedures.

Baseline documentation would have been expected even on a maiden voyage, it is unreasonable to expect a ship of such scale to function at all unless there is coordination and communication among the crew.  There was plenty of time, years in fact, for basic crew operational procedures to be created and prepared before the first ship set sale and, of course, this would have to be done for all ships of this nature, but it was evident that such operating procedures were lacking, missing and untested in the case of the Titanic.

The party responsible for operating procedures would likely be identified as being from the operations side of the project equation, but there would need to be some degree of such documentation provided by or coordinated with the engineering and construction teams as well.  Many of the procedures that broke done on the Titanic included chain of command failures under pressure with the director of the company taking over the bridge and the captain allowing it, wireless operators being instructed to relay passenger messages as a priority over iceberg warnings, allowing wireless operators to tell other ships attempting to warn them to stop broadcasting, critical messages not being brought to the bridge, tools needed for critical jobs not being supplied and so forth. (Kuntz)

Much like was needed with the engineering and design of the ships, the operations of the ships needed strong and holistic guidance ensuring that the ship and its crew worked as a whole rather than looking at departments, such as the Marconi wireless operators, as an individual unit.  In that example, they were not officially crew of the ship but employees of Marconi who were on board to handle paid passenger communiques and to only handle ship emergency traffic if time allowed.  Had they been overseen as part of a holistic operational management system, even as outside contractors, it is likely that their procedures would have been far more safety focused or, at the very least, that service level agreements around getting messages to the bridge would have been clearly defined rather than ad hoc and discretionary.

In any project and project component, good documentation whether of project goals, deliverables, procedures and so forth are critical and project management has little hope of success if good communications and documentation are not at the heart of everything that we do, both internally within the project and externally with stakeholders.

What we find today is that the project management lessons of the Olympic, Titanic and Britannic remain valuable to us today and the context of the era whether pushing for iterative project design where possible, investing in tribal knowledge, calculating risk, understanding the roles of system engineering and system operations or the interactions of protective external forces on product costs are still relevant.  The factors that affect projects come and go in cycles, today we see trends leaning towards models more like the Olympics than dislike them. In the future, likely, the pendulum will swing back again.  The underlying lessons are very relevant and will continue to be so.  We can learn much both by evaluating how our own projects are similar to those of White Star and how they are different to them.

Bibliography and Sources Cited:

Miller, Scott Alan.  Project Management of the RMS Titanic and the Olympic Ships, 2008.

Schwaber, Ken. Agile Project Management with Scrum. Redmond: Microsoft Press, 2003.

Kuntz, Tom. Titanic Disaster Hearings: The Official Transcripts of the 1912 Senate Investigation, The. New York: Pocket Books, 1998. Audio Edition via Audible.

Kozak-Holland, Mark. Lessons from History: Titanic Lessons for IT Projects. Toronto: Multi-Media Publications, 2005.

Brown, David G. “Titanic.” Professional Mariner: The Journal of the Maritime Industry, February 2007.

Esposito, Dino. “Cutting Edge – Don’t Gamble with UX—Use Wireframes.” MSDN Magazine, January 2016.

Sadur, James E. Home page. “Jim’s Titanic Website: Titanic History Timeline.” (2005): 13 February 2017.

Winchester, Simon. “Atlantic.” Harper Perennial, 2011.

Titanic-Titanic. “Olympic.” (Date Unknown): 15 February 2017.

Titanic-Titanic. “Guarantee Group.” (Date Unknown): 15 February 2017.

Brander, Roy. P. Eng. “The RMS Titanic and its Times: When Accountants Ruled the Waves – 69th Shock & Vibration Symposium, Elias Kline Memorial Lecture”. (1998): 16 February 2017.

Brander, Roy. P. Eng. “The Titanic Disaster: An Enduring Example of Money Management vs. Risk Management.” (1995): 16 February 2017.

Ellis, Sam. “This jet fighter is a disaster, but Congress keeps buying it.”. Vox, 30 January 2017.

Additional Notes:

Mark Kozak-Holland originally published his book in 2003 as a series of Gantthead articles on the Titanic:

Kozak-Holland, Mark. “IT Project Lessons from Titanic.” Gantthead.com the Online Community for IT Project Managers and later ProjectManagement.com (2003): 8 February 2017.

More Reading:

Kozak-Holland, Mark. Avoiding Project Disaster: Titanic Lessons for IT Executives. Toronto: Multi-Media Publications, 2006.

Kozak-Holland, Mark. On-line, On-time, On-budget: Titanic Lessons for the e-Business Executive. IBM Press, 2002.

US Senate and British Official Hearing and Inquiry Transcripts from 1912 at the Titanic Inquiry Project.

Standard Areas of Discipline Within IT

 

Information Technology and Business Infrastructure are an enormous field filled with numerous and extremely varied career opportunities not just in the industries in which work is done, but also in the type of work that is done. Only rarely are any two IT jobs truly alike. The variety is incredible. However, certain standard career foci do exist and should be understood and known to everyone in the field as they provide important terminology for mutual understanding.

It is very important to note that, like in any field, it is most common that a single person will do more than one role throughout their careers and even at the same time. Just as someone may be half time burger cook and half time cashier, someone may have their time split between different IT roles. But we need to know what those roles are and what they mean to be able to convey value, experience and expectation to others.

These are what we refer to as “IT Specializations” and are areas of specific focus and opportunity for deep skill within IT. These often do not just represent job roles within IT, but in large businesses generally are representative of entire departments of career peers who work together. None of these areas of focus is more or less senior to any other, these are different areas, not levels. There is no natural or organic progression from one IT discipline area to another, however all IT experience is valuable and it would be expected that experience in one discipline would prepare someone to more quickly learn and adapt to another area.

The terms “Administration” and “Engineering” are often applied today, these, again, are not levels, nor are they discipline areas. These refer to a role being focused on operations (the running of production systems) or on designing systems for deployment. These two share discipline areas. So, for example, the Systems discipline would have need for both administration and engineering workloads within it.

Systems. Shortened from “operating systems.” Systems roles are focused on the operating systems, normally of servers (but not necessarily in all cases.) This is the most broadly needed specialized IT role. Within systems, specializations tend to be such as Windows, RHEL, Suse, Ubuntu, AIX, HP-UX, Solaris, FreeBSD, Mac OSX and so forth. High level specializations such as UNIX are common with a single person or department servicing any system that falls under that umbrella, or larger organizations might split AIX, Solaris, RHEL and FreeBSD into four discrete teams to allow for a tight focus on skills, tools and knowledge. Systems specialists provide the application platform on which computer programs (which would also include databases) will run. Desktop support is generally seen as being a sub-discipline of systems, and one that often intersects pragmatically with end user and helpdesk roles.

Platforms. Also known as virtualization or cloud teams (depending on exact role), the platform discipline focues on the abstraction and management (hypervisor) layer that sits, or can sit, between physical hardware and the operating system(s). This team tends to focus on capacity planning, resource management and reliability. Foci within platform specialization would commonly include VMware ESXi, vCloud, Xen, XenServer, KVM, OpenStack, CloudStack, Eucalyptus, Hyper-V and so forth. With the advent of massively hosted platforms, there has also arisen a need for foci on specific hosted implementations of platforms such as Amazon AWS, Microsoft Azure, Rackspace, Softlayer and so on.

Storage. Storage of data is so critical to IT that it has filtered out as its own, highly focused discipline. Storage specialist generally focus on SAN, NAS and object store systems. Focus areas might include block storage in general, or might drill down to specific product or product lines, such as EMC VMAX or HPE 3PAR. With recent growth in scale out storage technologies, the storage arena is growing both in total size as well as in depth of skill expectation.

Databases. Similar to storage, databases provide critical “backing” of information to be consumed by other departments. While conceptually databases and storage overlap, in practicality the two are separated dramatically in how they are treated. We think of storage as “dumb”, “unstructured” or “bulk” storage and database as “smart”, “focused” or “highly structured” storage. At their fundamental level, the two are actually quite hard to distinguish. In practice, they are extremely different. Database specialists work specifically on database services, but rarely create databases and certainly do not code database connected applications. Like their systems counterparts, database specialists (often called DBAs) manage the database platform for other teams to consume. Database foci could be high level such as relational databases or non-relational (NoSQL) databases. Or, more commonly, a DBA would focus on one or more very specific database applications such as Informix, MS SQL Server, DBase, Firebird, PostgreSQL, MariaDB, MySQL, MongoDB, Redis, CouchDB, and many more.

Applications. Applications are the final product that consumes all other platform components from physical systems, platforms, systems, storage, databases and more. Applications are the ultimate component of the computational stack and can take a massive variety of forms. Application specialists would never use that term but would be referred to as a specialist on a specific application or set of applications. Some application families, such as CRM and ERP, as so large that an entire career might be spent learning and supporting a single one (such as an SAP ERP system.) While in many other cases one might manage and oversee hundreds of small applications over a career span. Common application areas include CRM, ERP, email, web portals, billing systems, inventory tracking, time tracking, productivity and so much more. Applications could include just about anything and while some are high provide, such as an Exchange email system; others might be very trivial such as a small desktop utility for calculating mortgage rates quickly.

Networking. Networks tie computers together and require much design and management on their own making them often the second largest discipline within IT. Network specialists work on buses, hubs, switches, routers, gateways, firewalls, unified thread management devices, VPNs, network proxies, load balancers and other aspects of allowing computers to speak to each other. Networking specialists typically focus on a vendor, such as Cisco or Juniper, rather than product types such as switches or routers. Networking is, with systems, the best well known or most commonly mentioned, role in IT even if the two are often confused. This role also supports the SAN (the actual network itself) for storage teams.

Security. Not truly an IT discipline itself, but rather an aspect that applies to every other role, IT Security specialists tend to either specialize by other discipline (network security, application security) or act as a cross discipline role with a focus on the security aspects as they cross those domains. Security specialists and teams might focus on proactive security, security testing and even social engineering.

Call Center, NOC or Helpdesk. The front line role for monitoring systems across other domains, taking incoming calls and emails and assisting in triage and sometimes direct support for an organization which may or may not include end users. This role varies heavily depending on who the direct “customer” of the service is, if tasks are interrupt (monitoring) based or if they are queue (ticket) based. Often the focus of this role is high level triage but can cross dramatically into end user support. This discipline is often seen as a “helper” group to other teams.

End User Support. Whether working sitting beside an end user in person (aka “deskside support) or remotely (aka helpdesk), end user support roles work directly with individual end users to resolve individual issues, communicate with other support teams, train and educate, and so forth. This is the only IT role that would commonly have any interaction with non-IT teams (unless reporting “up” in the organization to management.)

Hardware Technical Support. This role has no well known name and is often known only by the fact that it works with hardware. This role or family of roles includes the physical support and management of desktop or laptop devices, the support and management of physical servers, storage systems, or networking devices or the physical management of a datacenter or similar. This is the portion of IT that rubs shoulders with the “bench” field (considered to be outside of IT) and consists of much grey area overlapping with it. Hardware Support will often plug in and organize cables and generally works supporting other teams, predominately platforms or systems. Separating IT Hardware Support from Bench work is often nothing more than an “operational mindset” and most roles could potentially go in either direction. Placing desktops on desks is often seen as falling to bench, whereas racking, stacking and monitoring server hardware is generally seen as IT Hardware.


 

It is often practical to define what IT is not, rather than what it is. Many things are often assumed to be IT roles, but are not, and are so commonly connected with the field that it is worth expressly pointing out that they are not IT roles, but rather something else.

Project Management. PM is its own discipline that is far more a part of management than any other field and has no direct ties to IT whatsoever. IT often utilizes PM roles and PMs often oversee IT projects and IT companies or departments generally have PMs tasked to them; but the PM career itself is quite separate from IT. The same as any management role.

No Cabling. IT is most certainly not an electrician’s trade and the running off, termination of and certification of building cables is not even remotely within the scope of IT. Most IT departments will plug in computers to their network ports; but this no more makes IT the electrical maintenance department than plugging in a lamp at home makes you the electrician. The physical cabling plant of a company remains part of the electrical and maintenance roles clearly outside of IT.

No Programming. There is no programming role within IT. Software Engineering is a closely associated industry, but is not itself part of IT proper. The head of IT is seen as the CIO, the head of SE is see as the CTO. CIO is about business infrastructure – the “plumbing” of the organization. The CTO is about the engineering and creation of new tooling – often which would then be used by the IT organization. The expectation is that IT would request tools from SE. That is not to say that IT roles never write code, they often do, but coding is not the product of IT, it’s a tool in the toolset. An SE’s job is to deliver code as the end product.

No DevOps. DevOps is not a role. DevOps is a modern terminology for a specific style of working in other roles. One can be a DevOps System Admin or a DevOps Network Admin or a DevOps DBA, for example, but you can’t be just “DevOps” as it does not mean anything on its own. DevOps is a way of working, not a specific task. So we do not see DevOps on the list, even though DevOps is an important concept in IT in general.