Doing IT at Home: Good Documentation

One of the most rewarding home IT projects that I have done was to implement a system for “home documentation.”  In a business environment documentation is critical to nearly any process or department.  At home, documentation is critical too but often overlooked or approached from a completely different perspective than it is in a business, but there is no need for this.  Many people resort to special tools, iPhone apps or physical pen & paper notepads to address documenting things around the house.  I propose something far more enterprise and elegant.  A wiki.

Wikis have been around for some time now and nearly everyone is familiar with their use.  At its core a wiki is just a web-based application.  Wikis come in many shapes and forms and with varying degrees of complexity and run on different platforms.  This makes them very flexible and applicable to nearly anyone, regardless of what kind of systems you run at home.

Using a wiki for home use becomes very obvious quite quickly once the project is underway.  Documenting bills, accounts, purchases, home repairs, part numbers, service schedules, insurance information and your home network, of course, all make perfect sense and are easy to do.  The wiki does not need to be large, just big enough to be useful.  Mine is certainly not sprawling but all of my important data is housed in one, convenient place and is text searchable.  So even if I don’t know how I organized something, I can just search on it.  All of my important data is there, in a single place, so that I can look it up when needed and, more importantly, my wife can look it up and update it when needed.  It allows for simple, reliable collaboration.  And I make mine available from inside or outside the home, so I can access my information from work or while traveling.  That’s a functionality that traditional home documentation systems lack.

While there are many wikis available today, I will mention three that make the most sense for the vast majority of people.  These are DokuWiki, MediaWiki and SharePoint from Microsoft.  DokuWiki and MediaWiki have the advantage of running on UNIX so can be deployed in a variety of situations for low or no cost.  They are free themselves. DokuWiki shines in that it needs no database and uses nothing but the filesystem making it incredible simple to deploy, manage, backup and restore.  It is nothing more than a set of text files and a small PHP application that writes them.  MediaWiki is, by far, the most popular wiki option and, like DokuWiki, is an PHP application but is backed by a database, normally MySQL, making it more complex but giving it more power as well.  Many people would choose MediaWiki to use for home (as do I) because it provides the most direct experience for the largest number of businesses.  SharePoint is free if you have a Windows Server and is much more complex than the pure wiki options.  SharePoint is an entire application platform that also includes a wiki as a part of its core functionality.  If you are looking to move more heavily into the Microsoft ecosystem then using SharePoint would likely make the most sense and will provide a lot of additional functionality like calendaring and document storage too.

Running a wiki can help give meaning to a home web server.  Instead of sitting idle it can house important applications and really be used regularly.  While not a massive project having a wiki at home could be an important step to giving meaning to the home IT environment.  IT at home often suffers from lacking direction or purpose – implementing systems only like a lab and lacking real world use.  Like the PBX example in an earlier article, a home documentation wiki can give your network meaning and purpose.

Doing IT at Home: The Home PBX

I am often asked what projects I would recommend for someone to do at home to get more IT experience and I am often at a loss to come up with anything very interesting that is both educational and could actually prove practical in a daily use kind of way.  Having home IT projects that are actually used, day in and day out, really changes how projects are approached making them a little bit more like production systems with real users using them, performance mattering and ongoing management an important consideration.  Over the years I have discovered a few home IT projects that really make sense in a “more than just a lab for learning purposes” kind of way.  One of the best is running your own PBX to replace your home telephone.

Today, home telephones are becoming less and less common, partially because their traditional functionality has been widely displaced by mobile phones and partially because the legacy telephone system, even when delivered over VoIP, is rather archaic.  But in business, telephony is taking off as modern VoIP PBXs add new functionality and lower costs.  This is one place where treating your home like a business can really pay off.  People who have moved to mobile phones only will likely have noticed a few problems with that model.

Why mobile phones don’t replace home phones?

  • Mobile phones are attached to a person rather than a place.  The concepts behind using each are different.  Reaching a person is far more useful, but both have their uses and special functions.
  • Mobile phones are highly dynamic.  They turn on and off, they roam, they leave the country, they lose signal, they lose power, they get lost.  Home phones are highly static in comparison.
  • Mobile phones require one line per person, a home phone can provide many extensions from one line or number.
  • Home phones systems can offer redundancy or failover.
  • Home phones can be used remotely, over the Internet, from anywhere without needing to arrange international calling ahead of time, or at all.
  • Home phones can offer features like conference rooms, ring groups, queues, etc.

Building a PBX at home can be very low cost while providing a lot of functionality that traditional phones and mobile phones fail to provide.  I, myself, am very glad that I have a home phone still but was disappointed that I was paying so much for such limited functionality using a traditional carrier.  Even after moving to a pure VoIP carrier I was still paying more for my phone at home than the office paid for multiple business lines.  And an idea was born.

There is always more than one way to skin a cat and there are many PBX products that one could use for a home project of this nature.  Far and away, though, the most popular will be a flavor of Asterisk, the free, open source, enterprise voice switching system.  And within the Asterisk family, Elastix is the obvious choice for a project of this nature.   Not only does this give a good opportunity for learning a very popular telephony system but a good use for production management of CentOS (Red Hat) Linux as well.  Another option would be 3CX on Windows, for example, but this is more limited and requires more licensing but depending on your career path this may make equal or better sense for you.

Having a true, enterprise PBX in your home can serve many needs, all of which play wonderfully into expanding a professional portfolio and as running a home PBX remains rather an exclusive endeavor it is an ideal talking point for an interview.  Having a PBX means that all of the control usually reserved for a business is now available at home such as having extensions for each family member (kids want their own lines, no problem), conference room(s) for family meetings (a la Skype but easier, especially for family members dialing in from traditional phones or mobile phones,) ring and hunt groups for handling complex calling situations (just the parents, or just the kids,) flexible voicemail options, detailed call reporting, household paging systems, extension to extension calling, remote extensions (whether for family members when they are out of the house or extended family who just want an extension on the system for unlimited, free calling around the family), video phones, overhead paging (front door announcement system, perhaps,) and multiple shared lines for easy efficiency.  All of this for almost no cost.

A PBX is a great resource to be virtualized, especially if you are running Linux.  A PBX uses effectively no resources when idle and very little when active, even with several users.  This will easily be as small as the smallest web server that you are running at home.  And almost no storage is needed, only just enough to hold the voicemails and logs.  Ten years ago only paravirtualization could handle the needs of audio processing limiting you to Xen-based virtualization products only.  Today vSphere and HyperV join XenServer in being able to handle this workload without breaking a sweat (others will work too.)  So whatever virtualization you are using at home will work just fine (you may run into issues if you are using Type 2 virtualization like VirtualBox.)

The only actual expense for a home PBX, and truly even for a small business, is the cost of the trunk that brings in the connection to the public switched telephone service (the thing that provides the phone number.)  A typical home telephone service might cost $20 – $50 / month, even without a single call being made and no services more than a simple phone line, even when using VoIP.  There are some exceptions, but very few.  For my own home PBX project I selected a commercial VoIP carrier that gives me four lines in a single SIP trunk for $11/mo – everything included like unlimited incoming minutes, the DID (the phone number) and the only thing that is extra is outgoing minutes, which are super cheap.  My phone bill rarely tops $13!  That’s pretty amazing considering I turned off a single line $35/mo service and now have all of those features of a PBX and a pretty amazing talking point.

If you are looking for an interesting project that will do wonders for your resume while actually adding some practical value to your home a PBX may be a great place to start.

Replicated Local Storage

With the increased exposure of virtualization and the popularization of platform-level high availability solutions because of it the need and awareness of high availability storage has come to the forefront of all of IT and the SMB realm in particular.  Storage has become, not surprisingly, the most challenging aspect of virtualization today.

Most people investigating high availability storage solutions are aware of replication between SAN or NAS devices but are not aware that local storage can be replicated synchronously as well allowing for the same high availability practices without the need for external storage devices.  In fact, Replicated Local Storage (or RLS) is (and must logically be) the same technology used by a SAN or NAS to achieve high availability.  RLS is the underpinning of all high availability storage solutions, it is simply that we only refer to it by this name when we are looking at a device as being “local.”  If we were working on a SAN or a NAS then RLS would refer to its own replication technology.  When looking at a server connected to a replicated SAN we think of that replication as being non-local.  Local is a matter of current perspective.  At a technical layer all replication is RLS at the end of the day.

RLS technologies are popular are certain operating systems such as Linux where DRBD is native and accepted into the kernel.  The FreeBSD project has, in recent years, introduced its own native RLS technology known as HAST.  Windows does not have a native RLS option today.  Linux and FreeBSD lead the RLS charge in regards to common operating systems used in the SMB and are driving the industry forward with broader adoption of these technologies.

In virtualization we see many other approaches taken to provide RLS for virtualization platforms.  KVM, which is built on Linux, and the Xen family (including Xen, XenServer and others) which relies on Linux leverage DRBD for their own RLS.  The VMware ecosystem uses a replicated VSA approach with popular options being VMware’s own VSA product and HP’s VSA product.  Both of which use a virtualized, replicated NAS appliance to provide RLS to the platform.  On Microsoft’s HyperV the same is accomplished by the use of Starwind’s replicated SAN platform that behaves, essentials, the same as a VSA.

RLS is rapidly becoming more and more important as it scales well in small scale virtualization taking what has long been available as a niche clustering technology and pushing it into the mainstream.  Before high availability for virtualization was popularized in the SMB world these technologies were almost exclusively used for small scale UNIX high availability clustering.  They were important technologies and often used but received little industry attention as they were an “under the hood” detail of some UNIX systems.  Today, with the rapid uptake of high availability for virtualization, RLS has gone from a niche technology to one of the most key and appropriate technologies for nearly any SMB wishing to achieve high availability for their virtualization platforms.

 

When to Consider a SAN?

Everyone seems to want to jump into purchasing a SAN, sometimes quite passionately.  SANs are, admittedly, pretty cool.  They are one of the more fun and exciting, large scale hardware items that most IT professionals get a chance to have in their own shop.  Often the desire to have a SAN of ones own is a matter of “keeping up with the Jones” as using a SAN has become a bit of a status symbol – one of those last bastions of big business IT that you only see in a dedicated server closet and never in someone’s home (well, almost never.)  SANs are pushed heavily, advertised and sold as amazing boxes with internal redundancy making them infallible, speed that defies logic and loaded with features that you never knew that you needed.  When speaking to IT pros designing new systems, one of the most common design aspects that I hear is “well we don’t know much about our final design, but we know that we need a SAN.”

In the context of this article, I use SAN in its most common context, that is to mean a “block storage device” and not to refer to the entire storage network itself.  A storage network can exist for NAS but not use a SAN block storage device at all. So for this article SAN refers exclusively to SAN as a device, not SAN as a network.  SAN is a soft term used to mean multiple things at different times and can become quite confusing.  A SAN configured without a network becomes DAS.  DAS that is networked becomes SAN.

Let’s stop for a moment.  SAN is your back end storage.  The need for it would be, in all cases, determined by other aspects of your architecture.  If you have not yet decided upon many other pieces, you simply cannot know that a SAN is going to be needed, or even useful, in the final design.  Red flags. Red flags everywhere.  Imagine a Roman chariot race with the horses pushes the chariots (if you know what I mean.)

It is clear that the drive to implement a SAN is so strong that often entire projects are devised with little purpose except, it would seem, to justify the purchase of the SAN.  As with any project, the first question that one must ask is “What is the business need that we are attempting to fill?”   And work from there, not “We want to buy a SAN, where can we use it?”  SANs are complex, and with complexity comes fragility.  Very often SANs carry high cost.  But the scariest aspect of a SAN is the widespread lack of deep industry knowledge concerning them.  SANs pose huge technical and business risk that must be overcome to justify their use.  SANs are, without a doubt, very exciting and quite useful, but that is seldom good enough to warrant the desire for one.

We refer to SANs as “the storage of last resort.”  What this means is, when picking types of storage, you hope that you can use any of the other alternatives such as local drives, DAS (Direct Attach Storage) or NAS (Network Attached Storage) rather than SAN.  Most times, other options work wonderfully.  But there are times when the business needs demand requirements that can only reasonably be met with a SAN.  When those come up, we have no choice and must use a SAN.  But generally it can be avoided in favor of simpler and normally less costly or risky options.

I find that most people looking to implement a SAN are doing so under a number of misconceptions.

The first is that SANs, by their very nature, are highly reliable.  While there are certainly many SAN vendors and specific SAN products that are amazingly reliable, the same could be said about any IT product.  High end servers in the price range of high end SANs are every bit as reliable as SANs.  Since SANs are made from the same hardware components as normal servers, there is no magic to making them more reliable.  Anything that can be used to make a SAN reliable is a trickle down of server RAS (Reliability, Availability and Serviceability) technologies.  Just like SAN, NAS and DAS, as well as local disks, can be made incredibly reliable.  SAN only refers to the device being used to serve block storage rather than perform some other task.  A SAN is just a very simple server.  SANs encompass the entire range of reliability with mainframe-like reliability at the top end to devices that are nothing more than external hard drives – the most unreliable network devices on your network – on the bottom end.  So rather than SAN meaning reliability, it actually offers a few special cases of being the lowest reliability you can imagine.  But, for all intents and purposes, all servers share roughly equal reliability concerns.  SANs gain a reputation for reliability because often businesses put extreme budgets into their SANs that they do not put into their servers so that the comparison is a relatively high end SAN to a relatively budget server.

The second is that SAN means “big” and NAS means “small.”  There is no such association.  Both SANs and NASs can be of nearly any scale or quality.  They both run the gamut and there isn’t the slightest suggestion from the technology chosen whether a device is large or not.  Again, as above, SAN actually can technically come “smaller” than a NAS solution due to its possible simplicity but this is a specialty case and mostly only theoretical although there are SAN products on the market that are in this category, just very rare to find them in use.

The third is that SAN and NAS are dramatically different inside the chassis.  This is certainly not the case as the majority of SAN and NAS devices today are what is called “unified storage” meaning a storage appliance that acts simultaneously as both SAN and NAS.  This highlights that the key difference between the two is not in backend technology or hardware or size or reliability but the defining difference is the protocols used to transfer storage.  SANs are block storage exposing raw block devices onto the network using protocols like fibre channel, iSCSI, SAS, ZSAN, ATA over Ethernet (AoE) or Fibre Channel over Ethernet (FCoE.)  NAS, on the other hand, uses a network file system and exposes files onto the network using application layer protocols like NFS, SMB, AFP, HTTP and FTP which then ride over TCP/IP.

The fourth is that SANs are inherently a file sharing technology.  This is NAS.  SAN is simply taking your block storage (hard disk subsystem) and making it remotely available over a network.  The nature of networks suggests that we can attach that storage to multiple devices at once and indeed, physically, we can.  Just as we used to be able to physically attach multiple controllers to opposite ends of a SCSI ribbon cable with hard drives dangling in the middle.  This will, under normal circumstances, destroy all of the data on the drives as the controllers, which know nothing about each other, overwrite data from each other causing near instant corruption.  There are mechanisms available in special clustered filesystems and their drivers to allow for this, but this requires special knowledge and understanding that is far more technical than many people acquiring SANs are aware that they need for what they often believe is the very purpose of the SAN – a disaster so common that I probably speak to someone who has done just this almost weekly.  That the SAN puts at risk the very use case that most people believe it is designed to handle and not only fails to deliver the nearly magic protection sought but is, to the contrary, the very cause of the loss of data exposes the level of risk that implemented misunderstood storage technology carrier with it.

The fifth is that SANs are fast.  SANs can be fast; they can also be horrifically slow.  There is no intrinsic speed boost from the use of SAN technology on its own.  It is actually fairly difficult for SANs to overcome the inherent bottlenecks introduced by the network on which they sit.  As some other storage options such as DAS use all the same technologies as SAN but lack the bottleneck and latency of the actual network an equivalent DAS will also be just a little faster than its SAN complement.  SANs are generally a little faster than a hardware-identical NAS equivalent, but even this is not guaranteed.  SAN and NAS behave differently and in different use cases either may be the better performing.  SAN would rarely be chosen as a solution based on performance needs.

The sixth is that by being a SAN that the inherent problems associated with storage choices no longer apply.  A good example is the use of RAID 5.  This would be considered bad practice to do in a server, but when working with a SAN (which in theory is far more critical than a stand alone server) often careful storage subsystem planning is eschewed based on a belief that being a SAN that it has somehow fixed those issues or that they do not apply.  It is true that some high end SANs do have some amount of risk mitigation features unlikely to be found elsewhere, but these are rare and exclusively relegated to very high end units where using fragile designs would already be uncommon.  It is a dangerous, but very common practice, to take great care and consideration when planning storage for a physical server but when using a SAN that same planning and oversight is often skipped based on the assumption that the SAN handles all of that internally or that it is simply no longer needed.

Having shot down many misconceptions about SAN one may be wondering if SANs are ever appropriate.  They are, of course, quite important and incredibly valuable when used correctly.  The strongest points of SANs come from consolidation and special types of shared storage.

Consolidation was the historical driver bringing customers to SAN solutions.  A SAN allows us to combine many filesystems into a single disk array allowing far more efficient use of storage resources.  Because SAN is block level it is able to do this anytime that a traditional, local disk subsystem could be employed.  In many servers, and even many desktops, storage space is wasted due to the necessities of growth, planning and disk capacity granularity.  If we have twenty servers each with 300GB drive arrays but each only using 80GB of that capacity, we have large waste.  With a SAN would could consolidate to just 1.6TB plus a small amount necessary for overhead and spend far less on physical disks than if each server was maintaining its own storage.

Once we begin consolidating storage we begin to look for advanced consolidation opportunities.  Having consolidated many server filessytems onto a single SAN we have the chance, if our SAN implementation supports it, to deduplicate and compress that data which, in many cases such as server filesystems, can potentially result in significant utilization reduction.  So out 1.6TB in our example above might actually end up being only 800GB or less.  Suddenly our consolidation numbers are getting better and better.

To efficiently leverage consolidation it is necessary to have scale and this is where SANs really shine – when scale but in capacity and, more importantly, in the number of attaching nodes become very large.  SANs are best suited to large scale storage consolidation.  This is their sweet spot and what makes them nearly ubiquitous in large enterprises and very rare in small ones.

SANs are also very important for certain types of clustering and shared storage that requires single shared filesystem access.  These is actually a pretty rare need outside of one special circumstance – databases.  Most applications are happy to utilize any type of storage provided to them, but databases often require low level block access to be able to properly manipulate their data most effectively.  Because of this they can rarely be used, or used effectively, on NAS or file servers.  Providing high availability storage environments for database clusters has long been a key use case of SAN storage.

Outside of these two primary use cases, which justify the vast majority of SAN installations, SAN also provides for high levels of storage flexibility in making it potentially very simple to move, grow and modify storage in a large environment without needing to deal with physical moves or complicated procurement and provisioning.  Again, like consolidation, this is an artifact of large scale.

In very large environments, the use of SAN can also be used to provide a point a demarcation between storage and system engineering teams allowing there to be a handoff at the network layer, generally of fibre channel or iSCSI.  This clear separation of duties can be critical in allowing for teams to be highly segregated in companies that want highly discrete storage, network and systems teams.  This allows the storage team to do nothing but focus on storage and the systems team to do nothing but focus on the systems without any need for knowledge of the other team’s implementations.

For a long time SANs also presented themselves as a convenient means to improve storage performance.  This is not an intrinsic component of SAN but an outgrowth of their common use for consolidation.  Similarly to virtualization when used as consolidation, shared SANs will have a nature advantage of having better utilization of available spindles, centralized caches and bigger hardware than the equivalent storage spread out among many individual servers.  Like shared CPU resources, when the SAN is not receiving requests from multiple clients it has the ability to dedicate all of its capacity to servicing the requests of a single client providing an average performance experience potentially far higher than what an individual server would be able to affordably achieve on its own.

Using SAN for performance is rapidly fading from favor, however, because of the advent of SSD storage becoming very common.  As SSDs with incredibly low latency and high IOPS performance drop in price to the point where they are being added to stand alone servers as local cache or potentially even being used as mainline storage the bottleneck of the SANs networking becomes a larger and larger factor making it increasingly difficult for the consolidation benefits of a SAN to offset the performance benefits of local SSDs.  SSDs are potentially very disruptive for the shared storage market as they bring the performance advantage back towards local storage – just the latest in the ebb and flow of storage architecture design.

The most important aspect of SAN usage to remember is that SAN should not be a default starting point in storage planning.  It is one of many technology choices and one that often does not fit the bill as intended or does so but at an unnecessarily high price point either in monetary or complexity terms.  Start by defining business goals and needs.  Select SAN when it solves those needs most effectively, but keep an open mind and consider the overall storage needs of the environment.

The Information Technology Resource for Small Business