IT’s Most Needed Skills

IT does not exist in a bubble.  IT is a business enabler, a way of taking an existing business and making it more efficient, cost effective, nimble and capable.  Except for home hobbyist, and even there this isn’t quite true – IT is subject to the business that it supports.  It has a goal, an objective and a budget.  The business provides for context in which IT exists.

I speak with a wide array of IT professionals every day.  I work with both the enterprise and small business markets and I work with a review board overseeing IT and software development programs at a university.  On that board we were asked, “What is the single, most critical skill lacking in college graduates seeking jobs in IT today?”

The answer to that question was overwhelmingly “the ability to write and communicate effectively.”  No, it was not a technology skill.  It was not even a skill taught by the computer or technology department.  What we needed was for the English department to push these students harder and for the university to require more and harder English classes for non-majors and to demand that those skills be applied to classes taken in all disciplines and not relegate those skills purely for use in English-focused classes.

The ability to communicate effectively is critical in any profession but IT especially is a field where we need to be able to communicate large amounts of data, both technical and esoteric, rapidly, accurately and with extreme precision.  Most fields don’t penalize you to the same degree as IT for not knowing the correct use of white space or capitalization, spelling or context.  IT demands a level of attention to detail rare even in professional fields.

As a prime example I have seen the misuse of “Xen server” to mean “XenServer” no less than twenty times in an attempt to get technical assistance – which inevitably lead to useless advice since these are the proper names of two different products with unique configurations, vendors and troubleshooting procedures.  How many lost hours of productivity for each of those companies have happened just because someone cannot properly identify or communicate the software product with which they are seeking assistance?  Worse set, I’ve see this same product referred to as ZenServer or ZEN server – both of which are the correct names for other software products.  Yes, four different products that are all homonyms that require proper spelling, spacing and capitalization to reliably differentiate one from another.  The worse scenario is when someone writes “Xenserver” or “Xen Server”, neither being the exact name of any product, where the ambiguity means that there are at least two products equally far from matching what is given.  The person speaking often feels that the needs for precision is “annoying” but fails to understand why the advice that they receive doesn’t seem to apply to their situation.

I’ve seen confusion come from many written inaccuracies – mistaking versions of Windows or confusing “VMWare Server” for “VMWare ESXi” because someone refers to both or either simply as the name of the vendor and not of the product forgetting that that one vendor makes at least five or more different virtualization products.   These are basic written language skills necessary to successful work in IT.  Not only does lacking this skill create technical challenges in communicating to peers but it also implies an inability to document or search for information reliably – some of the most common and critical IT skills.  This, of course, also means that an IT professional in this position may be responsible for purchasing the wrong product from the wrong vendor simply because they did not take the time to be accurate in repeating product or vendor names or may cause system damage by following inappropriate advice or documentation.

Good communications skills go far beyond technical documentation and peer interactions – being able to communicate to the business or other support groups within the organization, to vendors or to customers is extremely important as well.  IT, more than nearly any other field, acquires, processes and disseminates information.  If an IT professional is unable to do so accurately their value diminishes rapidly.

The IT professional seeking to advance their career beyond pure technical pursuits needs the ability to interact with other departments most notably operations and business management in most cases.  These are areas within most companies where written word, as well as presentation, is highly valued and the IT team member able to present recommendations to management will have better visibility within the organization.  Technology departments need people with these skills in order to successfully present their needs to the business.  Without this skill within the ranks, IT departments often fail to push critical projects, secure funding or obtain necessary visibility to work effectively within the organization.

The second big skill needed in IT departments today is an understanding of business – both business in general and the business referring to the specific business of their own organization.  As I said at the beginning of this article, IT is a business enabler.  If IT professionals do not understand how IT relates to their business they will be poorly positioned to valuate IT needs and make recommendations in the context of the business.  Everything that IT does it does for the business, not for technology and not for its own purposes.

Within the IT ranks it is easy to become excited about new products and approaches – we love these things and likely this was a factor in our wanting to work in IT.  But finding the latest software release to be exciting or the latest, fastest hardware  to be “neat” are not sentiments that will cut muster with a business professional who needs to understand the ramifications of a technology investment.  IT professionals wishing to move beyond being purely technology implementers into being technology recommendors and advisers need to be able to speak fluently to business people in their own language and to frame IT decisions within the context of the business and its needs.

Linux Distro Release Schedules

One of the aspects of the Linux work compared to the Windows one is the variety and challenges of different release schedules.  In the Windows world this is pretty simple, there is one product and it releases when it releases, which is roughly once every two years or so.  Everyone working on Windows is very aware of the upcoming releases, when they will happen, when they go into release candidate, when their end of life is and so forth .  It is very clear and very simple.

In the Linux world, this is very different.  Of course, the biggest difference is that Windows is one product, one thing coming from a single vendor.  In Linux we are talking about a “family” of related products from many vendors, some with multiple products.  This is on top of the kernel release schedule that comes from Linux itself – which we will not worry about here.

Each distro is unique and makes its own release decisions.  In fact, release schedule is often a key factor in what defines one distro from another.  For example, all three primary enterprise Linux vendors offer two different products and in all three cases, the differentiation is primarily around release schedule!  So the concept of release schedule is certainly an important one in this marketplace.

There are three primary release “styles” that we find across all operating systems, not only Linux distros: long term release, short term release and rolling release.  Each release style serves a different purpose, but all generally follow a similar set of rules.

The idea of a release is that the packages within a release will not change outside of security and stability patches.  Of course, this is predicated on the behaviour of enterprise vendors as they exist today, any given distro may chosen to follow established norms or not.  There is no inherent rules of the universe that make this behaviour as it is; but it is a strong convention and the concept of a release is based upon this convention.

Long Term Release

This release model is the most common in the general field of enterprise operating systems and is followed outside of Linux by systems like FreeBSD, Solaris, AIX, Mac OSX and Windows.  Long Term Releases, often referred to by LTS, are designed around slow system change rates providing years, sometimes many years, between major system releases allowing IT teams to avoid migrations for much longer and giving software vendors targets that are stable for a long time.

In the enterprise Linux world, all vendors offer at least one Long Term Release products.  These are the most commonly deployed.

From Red Hat the RHEL and CentOS products are long term release with extremely long release cycles – not on a set schedule but currently releasing every three to four years.

Suse has two LTS products: Suse Linux Enterprise Server and openSuse Leap.  SLES maintains a release schedule that is currently between three and five years and openSuse Leap is based relatively closely on the SLES releases.

Ubuntu’s LTS release is conveniently named LTS and releases every two years on the even years, in April, like clockwork.  Ubuntu currently has the shortest release cycle for any LTS product in this category.

All Long Term Releases have minor releases that come out between the major releases and bring small changes or adjustments to the operating systems that are larger than would be appropriate to release with a patch, but are not large enough to justify a release of a new operating system.  The idea of these minor releases is that they are small enough to not be “breaking” allowing software that is targeted at the major release to remain functional throughout the major release cycle.  Major releases are considered “breaking” with large changes such as significant new kernel features, changes in package choices, new compiler features, different libraries, and so forth.

Short Term or Rapid Release

Long term release schedules obviously create problems for those seeking more modern packages and features.  To address this, all enterprise Linux vendors offer a short term release product.

Red Hat provides the Fedora distribution which releases roughly every six months, but with a flexible schedule.  Fedora is not exactly a separate distribution than RHEL and CentOS, but instead every so often a Fedora release is picked to be the “base” for a future RHEL and CentOS release.  The basis is not direct and some packages from later Fedora releases are sometimes added in, some changes are made, but the basics closely match a Fedora release.  The Fedora release is frozen and put through extensive testing before turning into a long term RHEL release.

The Suse family does not use a short term release product and is unique in this.

Ubuntu has a somewhat different strategy from Red Hat.  Ubuntu releases a product every six months, on a very set schedule.  Every fourth release is earmarked as the long term release, the other three are short term releases.  This makes for a far more simple and straightforward system than how Red Hat works with short term release users and long term release users overlapping for six months every two years.

Rolling Release

The more rapid release schedule type is that of the rolling release, which happens basically continuously.  This release strategy is uncommon, but is beginning to be taken more seriously in recent times.  Only Suse with the openSuse Tumbleweed distribution provides an enterprise rolling release system today.  Updates can be as frequent as every couple of days.

Unlike other release schedules which take large groups of packages and “freeze” them as a single release, the rolling release has updates to individual packages coming as they are ready.  So updates are small, but constant.  This allows for simplified adaptation keeping changes to a micro scale, but makes creating a single, predictable target very difficult.

Those looking for the most up to date packages and cutting edge features will find rolling releases to be the best way to keep everything as up to date as possible.


An important understanding of release schedules is that this is not directly tied to either the length of support given to a release nor does it indicate the amount of testing that goes into each release.

Each release style plays an important role in the system ecosystem and by having different styles of release the enterprise Linux world has greater variety and flexibility for addressing a greater range of needs than would otherwise be feasible.

Currently, long term releases are the most prominent and popular in systems administration, but this trend seems unlikely to continue.  The overall stability in the overall enterprise Linux space has increased and the need for currency is so often a more critical concern that more rapid distros are increasingly desired.

 

The High Cost of On Premises Infrastructure

IT Infrastructure is a challenge for any company and especially companies that are not large enough to implement their own, full scale datacenters.  Like many things in IT, major challenges come in the form of lacking specific, seldom used expertise as well as lacking the scale to utilize singular resources effectively.

This lack of scale can come in many forms.  The obvious one is in man power.  Managing a physical computing infrastructure uses unique skills that are separate from IT itself and are often desired to be available “around the clock.”  This can vary from security to electrical to cooling and facilities to “datacenter technician” style staff.  Of course, smaller businesses simply do without these roles available to them, but this raises the cost incurred on a “per server” basis to maintain the infrastructure.  Large businesses and dedicated datacenters leverage an efficiency of scale to make the cost of physically housing an IT infrastructure lower – either by actually lowering the cost directly or by raising the quality and reliability of the equipment.

The cost effectiveness of delivering power, cooling and datacenter services is only one aspect of the cost of IT infrastructure in a business.  Where many businesses attack this problem, by reducing infrastructure investment and staff, may counteract some amount of the up front costs of the infrastructure, but generally does so to the detriment of availability and longevity of equipment.  Whether it is a lack of ISP redundancy, an absence of diesel electric generators or shaving a year or two of service off of a server’s service life, these costs generally add up, often in ways that are difficult to identify and track.

We see the effects of low qualify infrastructure often come out in the behaviour and expectations of smaller businesses.  For example in the enterprise datacenter an average server lifespan may be ten years or more, but smaller businesses often assume that a server is worn out and unreliable in seven or eight years.  This increase in failure rate also leads to more concern about system failure.  Smaller businesses often see a higher, rather than a lower, need to have redundant systems even when lower revenue would normally suggest otherwise.  Small businesses are prone to investing heavily in high availability mechanisms, often at great expense, to mitigate a perceived risk of high system fail rates that larger businesses may be less likely to see.  These factors can combine to create a high cost through more rapid system replacement and a tendency towards overbuying hardware – sometimes even doubling the otherwise necessary investment to protect against risks created by lower quality facilities management.

This concept is not unique to information infrastructure.  In the audiophile world, while huge investments in high quality audio equipment is common, it is a rule of thumb that fifty percent of audio quality comes from the equipment and fifty percent comes from the environment into which it is placed.  This lesson applies to information infrastructure.   Lower cost gear may run longer and more reliably in a high quality physical environment than more expensive, better engineered equipment will in a lower quality one.

Of course the most obvious components of lower reliability come from being unable to maintain redundant generators, independent power rails, adequate fuel supplies, uninterrupted power supply units, steady temperature and humidity, air filtration and, of course, highly redundant multi-path WAN access.  These aspects we think of all the time and are almost completely out of reach of all but the largest companies.   Even simple things like restricting access to only essential server room staff can be an insurmountable challenge in a small environment.

These challenges create an opportunity to find alternatives for the SME, SMB and SOHO business markets to look for ways to leverage combined scale.  While many companies today turn to ideas such as hosted cloud computing, the associated costs to elastically expanding capacity often make this impractical as this same market struggles the most to have the ability to utilize that type of functionality.  Cloud computing can be an answer in some cases, but normally only for the very smallest of companies for whom a single server is too much scale, or for those companies so large that they have a DevOps-style automation infrastructure capable of scaling elastically with load demands and workloads that make sense for this process.  But these companies are the exception, not the norm.  More often hosted cloud computing makes sense for only a specific subset of public-facing workloads and only in some cases.

For the majority of companies too small to create the scale necessary to build out their own full scale IT infrastructure, the answer is likely going to be found in colocation.  It must be noted that there are obviously potentially overarching locational or environmental factors that can make off-premises infrastructures impossible or at least impractical.  Most businesses, however, will not be subject to these limitations.

Colocation tackles the cost challenges of the smaller business market by generating the scale necessary to make high quality, dedicated information infrastructure facilities possible.  This includes staff, WAN connectivity, environmental controls, power, and expertise.  Cost savings can often come from surprising places including lower power cost per kilowatt hour, lower cost of cooling and power conditioning and higher real estate density.

It is often believed that colocation represents a cost premium service for businesses that have needs above and beyond the average, but in reality colocation is often and should often be chosen because it represents an opportunity to lower costs while also improving reliability.  Colocation, in most cases, will actually bring a cost savings on a month by month basis providing for an impressive return on investment potential over time as the initial cost can be equal or similar to other investments, but the ongoing monthly cost can be lower and, perhaps more importantly, the costs can become far more predictable with fewer risks and unexpected expenditures.

Because the cost of services are potentially very granular it is actually far easier for colocation lower the overall expenditure than is generally believed.  For example, a small business with just one or two servers would still need certain basics such as air conditioning and UPS support plus footprint space and security; all dedicated for only a very small amount of equipment.  In a colocation facility these servers may represent less than one percent of the cooling of a large, high efficiency cooling system, may use just a small fraction of a large UPS and so forth.

Colocation also frees IT staff from performing datacenter functions, at which they are generally untrained and poorly qualified, to focus on the tasks at which they are more valuable and intentioned.  Then the datacenter tasks can be performed by experienced, dedicated datacenter staff.

Calculating exactly ROI can be challenging because individual cases are very unique and depend heavily on the workloads, use cases, independent needs and environmental factors of an individual business and the colocation options considered.  But it should be approached with a mindset that colocation does not present only an opportunity for improvements in the quality or reliability of IT infrastructure services, not that it can represent a return on investment but that it may, in fact, do both of these things on top of fundamentally lowering costs overall.

Titanic Project Management & Comparison with Software Projects

Few projects have ever taken on the fame and notoriety of that achieved by the Titanic and her sister Olympic ships, the Olympic and the Britannic, which began design one hundred and ten years ago this year.  There are, of course, many lessons that we can learn from the fate of the Olympic ships in regards to project management and, in fact, there are many aspects of project management that are worth covering.

(When referring to the ships as a whole I will simply reference them as The Olympics as the three together were White Star Line’s Olympic Class ships.  Titanic’s individual and latter fame is irrelevant here.  Also, I am taking the position here that the general information pertaining to the Olympic ships, their history and fate are common knowledge to the reader and will not cover them again.)

Given the frequency with which the project management of the Olympics has been covered, I think that it is more prudent to look at a few modern parallels where we can view current project management in today’s world through a valuable historic lens.  It is very much the case that project management is a discipline that has endured for millennia and many of the challenges, skills and techniques have not changed so much and the pitfalls of the past still very much apply to us today.  The old adage applies, if we don’t learn from the past we are doomed to repeat it.

My goal here, then, is to examine the risk analysis, perception and profile of the project and apply that to modern project management.

First, we must identify the stakeholders in the Olympics project. White Star Lines itself (sponsoring company and primary investor) and its director Joseph Bruce Ismay, Harland-Wolff (contracted ship builder) with its principle designers Alexander Carlisle and Thomas Andrews, the ships’ crew which includes Captain Edward John Smith, the British government as we will see later and, most importantly, the passengers.

As with any group of stakeholders there are different roles that are played.  White Star on one side is the sponsor and investor and in a modern software project would be analogous to a sponsoring customer, manager or department.  Harland-Wolff were the designers and builders and were most closely related to software engineering “team members” in a modern software team, the developers themselves.  The crew of the ships were responsible for operations after the project was completed and would be comparable to an IT operations team taking over the running of the final software after completion.  The passengers were much as end users today, hoping to benefit from both the engineering deliverable (ship or software) and the service build on top of that product (ferry service or IT managed services.) (“Olympic”)

Another axis of analysis of the project is that of chicken and pig stakeholders where chickens are invested and carry risk while pigs are fully invested and carry ultimate risk.  In normal software we use these comparatives to talk about degrees of stakeholders – those which are involved versus those that are committed, but in the case of the Olympic ships these terms take on new and horrific meaning as the crew and passengers literally put their lives on the line in the operational phase of the ships, whereas the investors and builders were only financially at risk. (Schwaber)

Second, I believe that it is useful to distinguish between different projects that exist within the context of the Olympics.  There was, of course, the design and construction of the three ships physically.  This is a single project, with two clear components – one of design and one of construction.  And three discrete deliverables, namely the three Olympic vessels.  There is, at the end of the construction phase, an extremely clear delineation point where the project managers and teams involved in the assembly of the ship would stop work and the crew that operated the ship would take over.

Here we can already draw an important analogue to the modern world of technology where software products are designed and developed by software engineers and, when they are complete, are handed over to the IT operational staff who take over the actual intended use of the final product.  These two teams may be internal under a single organizational umbrella or from two, or more, very separate organizations.  But the separation between the engineering and the operational departments has remained just as clear and distinct in most businesses today as it was for ship building and ferry service over a century ago.

We can go a step farther and compare White Star’s transatlantic ferry service to many modern software as a service vendors such as Microsoft Office 365, Salesforce or G Suite.  In these cases the company in question has an engineering or product development team that creates the core product and then a second team that takes that in-house product and operates it as a service.  This is increasingly an important business model in the software development space that the same company creating the software will be the ultimate operator of it, but for external clients.  In many ways the relevance of the Olympics to modern software and IT is increasing rather than decreasing.

This brings up an important interface understanding that was missed on the Olympics and is often missed today: each side of the hand-off believed that the other side was ultimately responsible for safety.  The engineers touted their safety of design, but when pushed were willing to compromise assuming that operational procedures would mitigate the risks and that their own efforts were largely redundant.  Likewise, when pushed to keep things moving and make good time the operations team were willing to compromise on procedures because they believed that the engineering team had gone so far as to make their efforts essentially wasted, the ship being so safe that operational precautions just were not warranted.  This miscommunication took the endeavor from having two types of systems of extreme safety down to basically none.  Had either side understood how the other would or did operate, they could have taken that into account.  In the end, both sides assumed, at least to some degree, that safety was the “other team’s job”.  While the ship was advertised heavily based on safety, the reality was that it continued the general trend of the past half century plus, where each year ships were made and operated less safely than the year before. (Brander 1995)

Today we see this same problem arising between IT and software engineering – less around stability (although that certainly remains true) but now about security, which can be viewed similarly to safety in the Olympics’ context.  Security has become one of the most important topics of the last decade on both sides of the technology fence and the industry faces the challenges created by the need for both sides to action security practices thoroughly – neither is capable of truly implementing secure systems alone. Planning for safety or security is simply not a substitute for enforcing it procedurally during operations.

An excellent comparison today is British Airways and how they approach every flight that they oversee as it crosses the Atlantic.  As the primary carrier of air traffic over the North Atlantic, the same path that the Olympics were intended to traverse, British Airways has to maintain a reputation for excellence in safety.  Even in 2017, flying over the North Atlantic is a precarious and complicated journey.

Before any British Airways flight takes off, the pilots and crew must review a three hundred page mission manual that tells them everything that is going on including details on the plane, crew, weather and so forth.  This process is so intense that British Airways refuses to even acknowledge that it is a flight, but officially refers to every single trip over the Atlantic as a “mission”; specifically to drive home to everyone involved the severity and risk involved in such an endeavor.  They clearly understand the importance of changing how people think about a trip such as this and are aware of what can happen should people begin to assume that everyone else will have done their job well and that they can cut corners on their own job.  They want no one to become careless or begin to feel that the flight, even though completed several times each day, is ever routine. (Winchester)

Had the British Airways approach been used with the Titanic, it is very likely that disaster would not have struck when it did.  The operational side alone could have prevented the disaster.  Likewise, had the ship engineers been held to the same standards as Boeing or AirBus today they likely would not have been so easily pressured by management to modify the safety requirements as they worked on the project.

What really affected the Olympics, in many ways, was a form of unchecked scope creep.  The project began as a traditional waterfall approach with “big design up front” and the initial requirements were good with safety playing a critical role.  Had the original project requirements and even much of the original design been used, the ships would have been far safer than they were.  But new requirements for larger dining rooms or more luxurious appointments took precedence and the scope and parameters of the project were changed to accommodate these new changes.  As with any project, no change happens in a vacuum but will have ramifications for other factors such as cost, safety or delivery date. (Sadur)

The scope creep on the Titanic specifically was dramatic, but hidden and not necessarily obvious for the most part.  It is easy to point out small changes such as a shift of dining room size, but of much greater importance was the change in the time frame in which the ship had to be delivered.  What really altered the scope was actually that initial deadlines and projects had to be maintained, relatively strictly.  This was specifically problematic because in the midst of Titanic’s dry dock work and later moored work, the older sibling, Olympic, was brought in for extensive repairs multiple times which had a very large impact on the amount of time in the original schedule available for Titanic’s own work to be completed.  This type of scope modification is very easy to overlook or ignore, especially in hindsight, as the physical deliverables and the original dates did not change in any dramatic way.  For all intents and purposes, however, Titanic was rushed through production much faster than had been originally planned.

In modern software engineering it is well accepted that no one can estimate the amount of time that a design task will take as well as the engineer(s) that will be doing the task themselves.  It is also generally accepted that there is no means of significantly speeding up engineering and design efforts through management pressure. Once a project is running at maximum speed, it is not going to go faster.  Attempts to go faster will often lead to mistakes, oversights or misses.  We know this to be true in software and can assume that it must have been true for ship design as well as the principles are the same.  Had the Titanic been given the appropriate amount of time for this process, it is possible that safety measures would have been more thoroughly considered or at least properly communicated to the operational team at hand off.  Teams that are rushed are forced to compromise and since time cannot be adjusted as it is the constraint, the corners have to be cut somewhere else and, almost always that comes from quality and thoroughness.  This might manifest itself as a mistake or perhaps as failing to fully review all of the factors involved when changing one portion of a design.

This brings us to holistic design thinking. At the beginning of the project the Olympics were designed with safety in mind: safety that results from the careful inter-workings of many separate systems that together are intended to make for a highly reliable ship.  We cannot look at the components of a ship of this magnitude individually, they make no sense – the design of the hull, the style of the decks, the weight of the cargo, the materials used, the style of the bulkheads are all interrelated and must function together.

When the project was pushed to complete more quickly or to change parameters this holistic thinking and a clear revisiting of earlier decisions was not done or not done adequately.  Rather, individual components were altered irrespective of how that would impact their role without the whole of the ship and the resulting impact to overall safety.  What may have seemed like a minor change had unintended consequences that were unforeseen because holistic project management was abandoned.  (Kozak-Holland)

This change to the engineering was mirrored, of course, in operations.  Each change, such as not using binoculars or not taking ice bucket readings, were individually somewhat minor, but taken together they were incredibly impactful.  Likely, but we cannot be sure, a cohesive project management or, at least, process improvement system was not being used.  Who was overseeing that binoculars were used, that the water tests were accurate and so forth?  Any check at all would have revealed that the tools needed for those tasks did not exist, at all.  There is no way that so much as a simple test run of the procedures could have been performed, let alone regular checking and process improvement.  Process improvement is especially highlighted by the fact that Captain Smith had had practice on the RMS Olympic, caused an at-sea collision on her fifth voyage and then nearly repeated the same mistake with the initial launch of the Titanic.  What should have been an important lesson learned by all captains and pilots of the Olympic ships instead was ignored and repeated, almost immediately. (“Olympic”)

Of course ship building and software are very different things, but many lessons can be shared.  One of the most important lessons is to see the limitations faced by ship building and to recognize when we are not forced to retain these same limitations when working with software.  The Olympic and Titanic were built nearly at the same time with absolutely no time for engineering knowledge gleaned from the Olympic’s construction, let alone her operation, to get to be applied to the Titanic’s construction.  In modern software we would never expect such a constraint and would be able to test software, at least to some small degree, before moving on to additional software that is based upon it either in real code or even conceptually.  Project management today needs to leverage the differences that exist both in more modern times and in our different industry to the best of its advantage.  Some software projects still do require processes like this but these have become more and more rare over time and today are dramatically less common than they were just twenty years ago.

It is well worth evaluating the work that was done by Harland-Wolff with the Olympics as they strove very evidently to incorporate what feedback loops were possible within their purview at the time.  Not only did they attempt to use the construction of earlier ships to learn more for the later ones, although this was very limited as the ships were mostly under construction concurrently and most lessons would not have had time to have been applied, but far more importantly they took the extraordinary step of having a “guarantee group” sail with the ships.  This guarantee group consisted of all manner of apprentice and master ship builders from all manner of support trades.  (“Guarantee Group”)

The use of the guarantee group for direct feedback was, and truly remains, unprecedented and was an enormous investment in hard cost and time for the ship builders to sacrifice so many valuable workers to sale in luxury back and forth across the Atlantic.  The group was able to inspect their work first hand, see it in action, gain an understanding of its use within the context of the working ship, work together on team building, knowledge transfers and more.  This was far more valuable than the feedback from the ship yards where the ships were overlapping in construction, this was a strong investment in the future of their ship building enterprise: a commitment to industrial education that would likely have benefited them for decades.

Modern deployment styles, tools and education have led from the vast majority of software being created under a Waterfall methodology not so distinct from that used in turn of the [last] century shipbuilding, to most leveraging some degree of Agile methodologies allowing for rapid testing, evaluation, changes and deployment.  Scope creep has changed from something that has to be mitigated or heavily managed to something that can be treated as expected and assumed within the development process even to the point of almost being leveraged.  One of the fundamental problems with big design up front is that it always requires the customer or customer-role stakeholder to make “big decisions up front” which are often far harder for them to make than the design is for the engineers.  These early decisions are often a primary contributor to scope creep or to later change requests and can often be reduced or avoided by agile processes that expect continuous change to occur to requirements and build that into the process.

The shipbuilders, Harlan and Wolff, did build a fifteen foot model of the Olympic for testing which is useful to some degree, but of course failed to mimic the hydrological action that the full size ship would later produce and failed to predict some of the more dangerous side effects of the new vessel’s size when close to other ships which led to the first accident of the group and to what was nearly a second.  The builders do appear to have made every effort to test and learn at every stage available to them throughout the design and construction process. (Kozak-Holland)

In comparison to modern project management this would be comparable to producing a rapid mock-up or wireframe for developers or even customers to get hands-on experience with before investing further effort into what might be a dead end path for unforeseen reasons.  This is especially important in user interface design where there is often little ability to properly predict usability or satisfaction ratings without providing a chance for actual users to physically manipulate the system and judge for themselves if it provides the experience for which they are looking. (Esposito)

We must, of course, consider the risk that the Olympics undertook within the context of their historical juxtaposition in regards to financial trends and forces.  At the time, starting from the middle of the previous century, the prevailing financial thinking was that it was best to lean towards the risky, rather than towards the safe – in terms of loss of life, cargo or ships; and to overcome the difference via insurance vehicles.  It was simply too financially advantageous for the ships to operate in a risky manner than to be overly cautious about human life. This trend, by the time of the Olympics, had been well established for nearly sixty years and would not begin to change until the heavy publicity of the Titanic sinking.  The market impact to the public did not exist until the “unsinkable” ship, with so many souls aboard, was lost in such a spectacular way.

This approach to risk and its financial trade offs is one that project managers must understand today the same as they did over one hundred years ago.  It is easy to be caught believing that risk is so important that it is worth any cost to eliminate, but projects cannot think this way.  It is possible to expend unlimited resources in the pursuit of risk reduction.  In the real world it is necessary that we balance risks with the cost of risk mitigation.  A great example of this in modern times, but outside that of software development specifically, is in the handling of credit card fraud in the United States.  Until just the past few years, it has generally been the opinion of the US credit card industry that the cost of greater security measures on credit cards to prevent theft were too high compared to the risks of not having them; essentially it has been more cost effective to spend money in reimbursing fake transactions than it was to prevent those fake transactions. This cost to risk ratio can sometimes be counterintuitive and even frustrating, but is one that has to drive project decisions in a logical, calculated fashion.

In a similar vein, it is common in IT to design systems believing that downtime is an essentially unlimited cost and to spend vastly more attempting to mitigate a downtime risk than the cost of the actual outage event itself would likely be if it were to occur.  This is obviously foolish, but so rarely are cost analysis of this type run or run correctly it becomes far too easy to fall prey to this mentality.  In software engineering projects we must approach risks in a similar fashion.  Accepting that there is risk, of any sort, and determining the actual risk, the magnitude of the impact of that risk and comparing that against the cost of mitigation strategies is critical to making an appropriate project management decision in regards to the risk. (Brander 1995)

Also of particular interest to extremely large projects, of which the Olympics certainly qualified, there is an additional concept of being “too big to fail.”  This, of course, is a modern phrase that came about during the financial crisis of the past decade, but the concept and the reality of this is far older and a valuable consideration to any project that falls onto a scale that would register a “national financial disaster” should the project totally falter.  In the case of the Olympics the British government ultimately insulated the investors from total disaster as the collapse of one of the largest passenger lines would have been devastating to the country at the time.

White Star Lines was simply “too big to fail” and was kept afloat, so to speak, by the government before being forcibly merged into Cunard some years later.  This concept, knowing that the government would not want to accept the risks of the company failing, may have been calculated or considered at the time, we do not know.  We do know, however, that this is taken into consideration today with very large projects.  An example of this happening currently is that of Lockheed Martin’s F-35 fighter which is dramatically over budget, past its delivery date and no longer even considered likely to be useful has been buoyed for years, but different government sponsors who see the project as too important, even in a state of failure to deliver, for the national economy to allow the project to fully collapse.  As this phenomenon becomes better and better known, it is likely that we will see more projects take this into consideration in their risk analysis phases. (Ellis)

Jumping to the operational side of the equation we could examine any number of aspects that went wrong leading to the sinking of the Titanic, but at the core I believe that what was most evident was a lack of standard operating procedures throughout the process.  This is understandable to some degree as the ship was on its maiden voyage and there was little time for process documentation and improvement.  However this was the flagship of a long standing shipping line that had a reputation to uphold and a great deal of experience in these matters.  It would also overlook that by the time that Titanic was attempting its first voyage that the Olympic had already been in service far more than enough to have developed a satisfactory set of standard operating procedures.

Baseline documentation would have been expected even on a maiden voyage, it is unreasonable to expect a ship of such scale to function at all unless there is coordination and communication among the crew.  There was plenty of time, years in fact, for basic crew operational procedures to be created and prepared before the first ship set sale and, of course, this would have to be done for all ships of this nature, but it was evident that such operating procedures were lacking, missing and untested in the case of the Titanic.

The party responsible for operating procedures would likely be identified as being from the operations side of the project equation, but there would need to be some degree of such documentation provided by or coordinated with the engineering and construction teams as well.  Many of the procedures that broke done on the Titanic included chain of command failures under pressure with the director of the company taking over the bridge and the captain allowing it, wireless operators being instructed to relay passenger messages as a priority over iceberg warnings, allowing wireless operators to tell other ships attempting to warn them to stop broadcasting, critical messages not being brought to the bridge, tools needed for critical jobs not being supplied and so forth. (Kuntz)

Much like was needed with the engineering and design of the ships, the operations of the ships needed strong and holistic guidance ensuring that the ship and its crew worked as a whole rather than looking at departments, such as the Marconi wireless operators, as an individual unit.  In that example, they were not officially crew of the ship but employees of Marconi who were on board to handle paid passenger communiques and to only handle ship emergency traffic if time allowed.  Had they been overseen as part of a holistic operational management system, even as outside contractors, it is likely that their procedures would have been far more safety focused or, at the very least, that service level agreements around getting messages to the bridge would have been clearly defined rather than ad hoc and discretionary.

In any project and project component, good documentation whether of project goals, deliverables, procedures and so forth are critical and project management has little hope of success if good communications and documentation are not at the heart of everything that we do, both internally within the project and externally with stakeholders.

What we find today is that the project management lessons of the Olympic, Titanic and Britannic remain valuable to us today and the context of the era whether pushing for iterative project design where possible, investing in tribal knowledge, calculating risk, understanding the roles of system engineering and system operations or the interactions of protective external forces on product costs are still relevant.  The factors that affect projects come and go in cycles, today we see trends leaning towards models more like the Olympics than dislike them. In the future, likely, the pendulum will swing back again.  The underlying lessons are very relevant and will continue to be so.  We can learn much both by evaluating how our own projects are similar to those of White Star and how they are different to them.

Bibliography and Sources Cited:

Miller, Scott Alan.  Project Management of the RMS Titanic and the Olympic Ships, 2008.

Schwaber, Ken. Agile Project Management with Scrum. Redmond: Microsoft Press, 2003.

Kuntz, Tom. Titanic Disaster Hearings: The Official Transcripts of the 1912 Senate Investigation, The. New York: Pocket Books, 1998. Audio Edition via Audible.

Kozak-Holland, Mark. Lessons from History: Titanic Lessons for IT Projects. Toronto: Multi-Media Publications, 2005.

Brown, David G. “Titanic.” Professional Mariner: The Journal of the Maritime Industry, February 2007.

Esposito, Dino. “Cutting Edge – Don’t Gamble with UX—Use Wireframes.” MSDN Magazine, January 2016.

Sadur, James E. Home page. “Jim’s Titanic Website: Titanic History Timeline.” (2005): 13 February 2017.

Winchester, Simon. “Atlantic.” Harper Perennial, 2011.

Titanic-Titanic. “Olympic.” (Date Unknown): 15 February 2017.

Titanic-Titanic. “Guarantee Group.” (Date Unknown): 15 February 2017.

Brander, Roy. P. Eng. “The RMS Titanic and its Times: When Accountants Ruled the Waves – 69th Shock & Vibration Symposium, Elias Kline Memorial Lecture”. (1998): 16 February 2017.

Brander, Roy. P. Eng. “The Titanic Disaster: An Enduring Example of Money Management vs. Risk Management.” (1995): 16 February 2017.

Ellis, Sam. “This jet fighter is a disaster, but Congress keeps buying it.”. Vox, 30 January 2017.

Additional Notes:

Mark Kozak-Holland originally published his book in 2003 as a series of Gantthead articles on the Titanic:

Kozak-Holland, Mark. “IT Project Lessons from Titanic.” Gantthead.com the Online Community for IT Project Managers and later ProjectManagement.com (2003): 8 February 2017.

More Reading:

Kozak-Holland, Mark. Avoiding Project Disaster: Titanic Lessons for IT Executives. Toronto: Multi-Media Publications, 2006.

Kozak-Holland, Mark. On-line, On-time, On-budget: Titanic Lessons for the e-Business Executive. IBM Press, 2002.

US Senate and British Official Hearing and Inquiry Transcripts from 1912 at the Titanic Inquiry Project.

The Information Technology Resource for Small Business