All posts by Scott Alan Miller

Started in software development with Eastman Kodak in 1989 as an intern in database development (making database platforms themselves.) Began transitioning to IT in 1994 with my first mixed role in system administration.

The Risks of Licensing

There are so many kinds of risk that we address and must consider in IT systems it is easy to overlook risks that are non-technical, especially ones that we often do not address directly, such as licensing.  But licensing carries risks, and costs, that must be considered in everything that we do in IT.

As I write this article, the risks of licensing are very fresh in the news.  Just yesterday, one of the largest and best known cloud computing providers suffered a global, three hour outage that was later attributed to accidentally allowing some of their licensing to expire.  One relatively minor component in their infrastructure stack, with massive global redundancy reduced to worthlessness in a stroke as their licensing expired. Having licensing dependencies means having to carefully manage them.  Some licenses are more dangerous than others.  Some only leave you exposed to audits, others create outages or dataloss.

Licensing may be a risk intentionally, as in the example above where the license expired and the equipment stopped working.  Or they can be less intentional, such a remote kill switches or confusion of equipment with dates or misconfiguration causes systems to fail.  But it is a risk that must be considered and, quite often, may have to be mitigated.  The risk of critical systems time bombing or dying in unrepairable ways can be very dangerous.  Unlike hardware or software failure, there is often no recourse to repair systems without access to a vendor.  A vendor that may be offline, might be out of support, might no longer support the product, may have technical issues of their own or may even be out of business!

Often, licensing outages put customers into a position of extreme leverage for a vendor who can charge nearly any amount that they want for renewed licensing during a pending or worse, already happened, outage.  Due to pressure, customers may easily pay many times the normal prices for licensing to get systems back online and restore customer confidence.

While some licensing represents extreme risk, and some merely an inconvenience this risk must be evaluated and understood.  In my own experience I have seen critical software have licensing revoked by a foreign software vendor simply looking to force a purchasing discussion and causing large losses to environments for which there was little legal recourse, simply because they had the simple ability to remotely kill systems via their licensing even for paid costumers.  Generally illegal and certainly unethical, there is often little recourse for customers in these situations.

And of course, many license issues can be technical or accidental.  Simply that licensing servers go offline, systems break, accidents happen.  Systems that are designed to become inaccessible when they cannot validate their licenses simply carry an entire category of risk that other types of systems do not.  A risk that is more common than people often realize and often has some of the least ability to be mitigated.

Of course beyond these kinds of risks, licensing also carries overhead which, as always, is a form of risk which, in turn, is a form of cost.  Researching, acquiring, tracking and maintaining licenses, even those that would not potentially cripple your infrastructure, takes time and time is money.  And licensing always carries the risk that you will buy too little and be exposed to audits (or buy incorrectly) or that you will buy too much and overspend.  In any of these cases, this is cost that must be calculated into the overall TCO of any solution, but are often ignored.

Licensing time and costs are often one of the more significant costs in a problem, but because they are ignored it can be extremely different to understand how they play into the long term financial picture of solutions – especially as they often later then impact other decisions in various ways.

Licensing is just a fact of life in IT, but one that is hardly cool or interesting so is often ignored or, at least, not discussed heavily.  Being mindful that licensing has costs to manage just like any other aspect of IT and carries risk, potentially very large risk, that needs to be addressed are just part of good IT decision making.

If It Ain’t Broke, Don’t Fix It

We’ve all heard that plenty, right?  “If it ain’t broke, don’t fix it.”  People use it everywhere as a way to discourage improvements, modernization or refactoring.  Many people say it and as with many phrases of this nature, on the surface it seems reasonable.  But in application, it really is not or, at least, not as normally used because it is not well understood.

This is very similar to the concept of telling people not to put all of their eggs in one basket, where it is more often than not applied to situations where the eggs and basket analogy does not apply or is inverted from reality.  But because it is a memorized phrase, they forget that there is a metaphor that needs to hold up for it to work.  It can lead to horrible decisions because it invokes and irrational fear founded on nothing.

Likewise, the idea of not fixing things that are not broken comes from the theory that something that is perfectly good and functional should not be taken apart and messed with just for the sake of messing with it.  This makes sense.  But for some reason, this logic is almost never applies to things where it would make sense (I’m not even sure of a good example of one of these) but instead is applied to complex devices that require regular maintenance and upkeep in order to work properly.

Of course if your shoe is not broken, don’t tear it apart and attempt to glue it back together again.  But your business infrastructure systems are nothing like a shoe.  They are living systems with enormous levels of complexity that function in an ever changing landscape.  They require constant maintenance, oversight, updating and so forth to remain healthy.  Must like a car, but dramatically moreso.

You never, we hope, hear someone tell you that you don’t need to change the oil in your car until the engine has seized.  Of course not, even though it is not yet broken, the point is to do maintenance to keep it from breaking.  We know with a car that if we wait until it breaks, it will be really broken.  Likewise we would not refuse to put air in the tires until the flat tires of ripped off of the wheels.  It just makes no sense.

Telling someone not to maintain systems until it is too late is the same as telling them to break them.  A properly maintained car might last hundreds of thousands of miles, maybe millions.  One without oil will be lucky to make it across town.  Buying a new engine every few days rather than taking care of the one that you have means you might go a lifetime without destroying an engine.

The same goes for your business infrastructure.  Code ages, systems wear out, new technology emerges, new needs exist, the network interacts with the outside world, new features are needed, vulnerabilities and fragility is identified and fixed, updates come out, new attacks are developed and so forth.  Even if new features never were created, systems need to be diligently managed and maintained in order to ensure safe and reliable operation – like a car but a thousand times more complex.

In terms of IT systems, broken means unnecessary exposed to hacking, data theft, data loss, downtime and inefficiencies.  In the real world, we should be considering the system to be broken the moment that maintenance is needed.  How much ransomware would not be a threat today if systems were simply properly maintained?  As IT we need to stand up and explain that unmaintained systems are already broken, disaster just hasn’t struck yet.

If we were to follow the mantra of “if it ain’t broke, don’t fix it” in IT, we wait *until* our data was stolen to patch vulnerabilities, or wait until data was unrecoverable to see if we had working backups.  Of course, that makes no sense. But this is what is often suggested when people tell you not to fix your systems until they break – they are telling you to let them break! Push back, don’t accept that kind of advice.  Explain that the purpose of good IT maintenance is to avoid systems breaking whenever possible.  Avoiding disaster, rather than inviting it.

The Social Contract of Sales

In IT we tend to deal with more sales scenarios than most business positions will do.  An accountant, for example, is rarely in a position to buy equipment, software or products for their business, for example.  Positions that do buy things regularly, such as the housekeeping department, tend to buy small ticket items like bleach, window cleaner and garbage bags.  IT, however, tends to buy large cost items, with big margins, with great regularity making it have a need for understanding the world of sales and marketing far better than nearly any other department.

Because of this, understanding concepts like the Social Contract of Sales is far more critical for IT workers than for nearly anyone else outside of the business tiers even though this is just a general social contract that everyone in society is expected to understand intuitively and is just common sense.  But due to the very high danger of misconstruing this social contract in an IT context, and because IT workers are often hired with this specific area of competence ignore but then expected to work specifically around it heavily, we need to discuss it in this context.

The social contract is this: “Sales people represent a product or vendor, are compensated and to some degree obligated to push their product.  They cannot lie, but their intent is to convince.”

This should be ridiculously obvious, and yet there is an incredibly common belief that sales people will act against either their own self interest or the interest of their employer (which would be unethical) in order to act as a friend, adviser or possibly even engineer for customers.  This makes no sense.  Not only are they not paid to do this, they are specifically paid not to do this.   And there is the obvious social contract that tells everyone involved that they are sales people and no one should be surprised when they attempt to convince you to purchase whatever it is that they sell.

We have social or natural contracts like this all over the place and we need them to operate intelligently.  If you are walking in the woods and you meet a bear you have a natural contract with them that says if you try to touch them, they will try to eat you.  No one expects a bear to act differently from this and it is silly and pointless to hope that your interaction with a wild bear will be different from this.  But, you are free to test that contract.

The social contract of sales, or anything, does not make it ethical for a sales person to lie.  That would be an impossible situation.  But it is also considered to be part of the social contract that all sales, promotions and marketing only deal with the concept of “truth” when dealing with quantifiable factors and never qualifiable ones.

For example, a car salesman is always free to claim that their car is the nicest, prettiest, or most comfortable regardless if anyone believes that to be true.  But they are not free to lie about how many seats it has or the gas mileage.

Likewise, IT professionals both in house and paid advisers, have a social contract to represent their employers and to obviously protect them from sales that do not make sense.  Our professional has a responsibility in our handling of sales  people.  We are the gatekeepers.  No one else in the business has the expected ability to know when services or products are sensible or cost effective to meet our needs.  No one else is in a position where any contact with sales would make sense.

If we, as the IT gatekeepers, become confused as to the nature of the social contract and think that sales people are “on our side” looking out for our interests instead of their own or their employers, or we forget that only quantifiable facts are meaningful we can be easily misled – often by ourselves.  It is all too tempting to feel that sales people are there on our behalf, instead of their own.

A common sales tactic, that is incredibly effective against IT buyers, is the offer of free work.  IT decision making can be hard and, of course, sales people will happily take decision making off of our plates.  This is handy for them, as they can then make decisions that involve buying their services or products.  The decision to allow a sales person to do our jobs for us is a foregone conclusion to buy their products.  No one allowing a sales person to do this can make the reasonable claim that they had not made the decision to go with that vendor’s products at that point.

Doing this would, of course, violate our own social contract with our employers.  We are paid to do the IT work, to make the decisions, to make sure that sales people do not take advantage of the organization.  Handing our role over to the “enemy” that we are paid to protect against is exactly what our job role exists to prevent.  If our employers wanted sales people to simply sell the company whatever they wanted, they would eliminate the IT role and just talk to the sales people directly.  IT’s purpose instantly evaporates in that scenario.

Also within the social contract is that anyone that works on behalf of a vendor or a vendor representative (like a reseller) is a salesperson as well or, at the very least, partakes in the shares social contract.  They are employed to promote their products and have an obligation to do so even if their role is primarily technical, account management or whatever.  It is common for vendors to have employee positions with names like “presales engineer” or resellers to brand themselves “MSPs” to make it sound like they might be purely technical (and the implication of being “above” the sales world) or being customer representatives but neither is logically true.  Working for an organization that sells products, everyone who works there is a representative of those products.  Titles do not alter that social contract.

As IT Pros, it is our responsibility to understand and recognize the social contract of sales and to identify people who work for organizations that cause them to fall under the contract.  An ethical sales person cannot directly lie to us, but they will almost always happily allow us to lie to ourselves and that is one of the most powerful tools that they have.  We want them to be our friends, we want to be able to take it easy and let them do our job for us… and they will let us believe that all that we want.  But what we have to remember that as part of the assumptions of that social contract is that we know that this is how they are tasked with behaving and that it is our responsibility and no one else’s to ensure that we treat them like vendor agents and never confuse them with being our advisers.

Virtualize Domain Controllers

One would think that the idea of virtualizing Active Directory Domain Controllers would not be a topic needing discussion, and yet I find that the question arises regularly as to whether or not AD DCs should be virtualized.  In theory, there is no need to ask this question because we have far more general guidance in the industry that tells us that all possible workloads should be virtualized and AD certainly presents no special cases with which to create an exception to this long standing and general rule.

Oddly, people seem to go out regularly seeking clarification on this one particular workload, however and if you seek bad advice, someone is sure to provide.  Tons of people post advice recommending physical servers for Active Directory, but rarely, if ever, with any explanation as to why they would recommend violating best practices at all, let alone with such a mundane and well known workload.

As to why people implementing AD DCs decide that it warrants specific investigation around virtualization when no other workload does, I cannot answer.  But after many years of research into this phenomenon I do have some insight into the source of the reckless advice around physical deployments.

The first mistake comes from a general misunderstanding of what virtualization even is.  This is sadly incredibly common and people quite often think that virtualization means consolidation, which of course it does not.  So they take that mistake and then apply the false logic that consolidation means consolidating two AD DCs onto the same physical host.  It also requires the leap to thinking that there will always be two or more AD DCs, but this is also a common belief.  So three large mistakes in logic come together for some very bad advice that, if you dig into the recommendations, you can normally trace back.  This seems to be the root of the majority of the bad advice.

Other causes are sometimes misunderstanding actual best practices, such as the phrase “If you have two AD DCs, each needs to be on a separate physical host.”  This statement is telling us that two physically disparate machines need to be used in this scenario, which is absolutely correct.  But it does not imply that either of them should not have a hypervisor, only that two different hosts are needed.  The wording used for this kind of advice is often hard to understand if you don’t have the existing understanding that under no circumstance is a non-virtual workload acceptable.  If you read the recommendation with that understanding, its meaning is clear and, hopefully, obvious.  Sadly, that recommendation often gets repeated out of context so the underlying meaning can easily get lost.

Long ago, as in around a decade ago, some virtualization platforms had some issues around timing and system clocks that could play havoc with clustered database systems like Active Directory.  This was a legitimate issue long ago but was long ago solved, as it needed to be for many different workloads.  A perception was created that AD might need special treatment, however, and it seems to linger on even though it has been a generation or two in IT terms since this was an issue and should have long ago been forgotten.

Another myth leading to bad advice is rooted in the fact that AD DCs, like other clustered databases, when used in a clustered mode should not be snapshotted as this will easily create database corruption if only one node of the cluster gets restored in that manner.  This is, however, a general aspect of storage and databases and is not related to virtualization at all.  The same information is necessary for physical AD DCs just the same.  That snapshots are associated with virtualization is another myth; virtualization implies no such storage artefact.

Still other myths arise from a belief that virtualization much rely on Active Directory itself in order to function and therefore AD has to run without virtualization.  This is completely a myth and nonsensical.  There is no such circular requirement.

Sadly, some areas of technical have given rise to large scale myths, often many of them, that surround them and can make it difficult to figure out the truth.  Virtualization is just complex enough that many people attempt to learn but just how to use it, but what it is conceptually, by rote giving rise to sometimes crazy misconceptions that are so far afield that it can be hard to figure out that that is really what we are seeing.  And in a case like this, misconceptions around virtualization, history, clustered databases, high availability techniques, storage and more add up to layer upon layer of misconceptions making it hard to figure out how so many things can come together around one deployment question.

At the end of the day, few workloads are as ideally suited to virtualization as Active Directory Domain Controllers are.  There is no case where the idea of using a physical bare metal operating system deployment for a DC should be considered – virtualize every time.