Conceptually the idea of “backup” has become a murky area within IT. Everyone seems to have their own concepts of what a backup is and how they expect it to behave. This can be dangerous when the person supplying backup and the person consuming backup have a mismatch in expectations. I see this happen every day even with traditional backup mechanisms. With new types of backups appearing on a regular basis the opportunities for miscommunications and loss of data become much more pronounced.
By traditional backups I refer to the traditional world of tape-based backups with a grandfather – father – son rotational strategy in place, just to set the stage for the discussion. New backups might include system images, disk-based backups, continuous backups and backups to “the cloud” or online backups. The world of backups is evolving rapidly and now is when misunderstandings begin to put corporate data resources at risk.
So what exactly is a “backup”? The concept sounds simple, but what do we really mean when we use the term? Do we mean the ability to restore a system after it has failed? The ability to roll back to an earlier version of a file? Perhaps archiving of data when the original no longer exists? How long do which files get kept? Does this apply only to file data or are emails and databases included too? Do we only need to restore in case of system failure or do we need the ability to restore granular data as well? Do we need only one copy or do we need copies of every version of a file?
Now, with the additional risks posed by things like ransonware, we have even more concerns than ever before and ideas around not just versioning but potentially unlimited versioning and air gapping between systems and backups has become of a concern where before, it generally was not.
Many organizations, especially smaller ones, often choose to approach backups a bit differently from enterprises and often eschew backups completely. Instead they “take backups” but then often delete the original files. And instead of keeping many copies of the files that have been “backed up” they opt to keep only a single copy (or multiple versions that are co-dependent on each other) . This means that what they have is not really a backup, but rather an archive. If the one disk or tape on which the file is stored becomes damaged, the file is lost completely.
The term backup implies that there are at least two copies of some piece of data that do not rely on each other. An archive does not imply this and just implies that we have taken data from production to another system, presumably one that is lower cost and likely much lower and harder to retrieve. Archived data implies no redundancy, unlike the term backup.
If we “take a backup” and then proceed to delete the original data we no longer have a backup and the file that is stored in the “backup system”, whether this is on disk, a tape in a vault or whatever, turns into an archive of the original data rather than a backup of it. It is now our source file, rather than being a copy. This is some of the magic of digital media, copies are a clone rather than a mimic so the archival component is legitimately the original in every sense.
This may seem pedantic but it truly is not. If a business is paying for backups, they likely assume that that cost is going towards having some redundancy, not just a single copy of data. And if you have regulations around being required to keep backups for compliance reasons, only having an archival copy is a clear violation of that requirement. Having two systems fail and being unable to retrieve data is an edge case that all compliance must accept. But having an archival system fail where a backup is required but was not kept, is not an acceptable scenario.
For this reason, and many more, concepts like the 3-2-1 backup methodology make sense because this approach guarantees that backups are kept within the backup system and originals do not need to be kept on production. In some ways of thinking, this approach could be thought of as merging archiving and backups into a single system which adds much clarity to the design.
Whatever backup system works for you, be cognizant that backups mean independent copies and that in many ways, independent copies that do not share failure domains has become nearly a requirement for all backups today.
That’s a good point and an easy misconception to have about backups. It does seem obvious that a backup would include at least 2 copies but I can see how a lack of communication in this area would put data at risk.