Software Engineering Blog

Tipps und Tricks aus dem Leben eines Systemadministrators.

Virtualization is not HA

Availability vs Fault Tolerance

As I have been asked multiple times if running a software / service in a virtual environment like a VMware Cluster requires licensing HA (High Availability), consider the following statements:

  1. Virtualization and HA are orthogonal concepts
  2. There is no non-HA and HA, but several levels or classes
  3. Increased fault-tolerance is used to achieve HA

1. Virtualization vs HA

Running software (single instance) in a multi-node cluster might reduce downtimes due to planned maintenance, as the VM can be live-migrated to another node prior to that. This in fact increases the availability of the instance. However this only holds for planned events. If the hardware the VM is running on fails unexpectedly, there is no time for migration to another node. Hence, the instance just dies.

Another possible outage could be a network partition, where the VMs are still running and operational, but cannot be reached from the other partition.

2. Levels of HA

There are various definitions of HA levels. These can be split into two categories:

Based on the service quality

Based on share of available time

This is the probably more common definition, where HA is defined by the share of the outage-time:

availabilityShare = 1 - outage/(available+outage)

There, the availability classes are defined according to the number of nines of the availability share:

  • class 2: 99%
  • class 3: 99.9%
  • class n: 1-0.1^n

For details see High_availability on Wikipedia.

3. Fault Tolerance

A key aspect to achive HA is to reduce the number of single-point of failures. This can be done on both the hardware and the software side. These are common examples:

Hardware

  • storage: use redundant disk arrays (RAID)
  • network: use two network adapters

Software

  • run two instances on different nodes
  • use slightly different implementations on both instances to avoid bugs in the implementation

Environment

  • use uninterruptible power supply (UPS)
  • redundant network topology

Conclusion

Virtualisation is per-se not a HA concept, but is useful to reduce downtimes due to planned maintenace. To achive actual HA (>= class 4) the setup has to be designed in a way to reduce single point of failures. If running regular software on commodity / server hardware and standard infrastructure this also requires to run at least two instances.

Disclaimer: The opinion stated above shows the technical point of view. Some companies might define HA in a different way. If unsure, ask their sales and legal departments prior to the implementation.

Kommentare

Einen Kommentar schreiben

Bitte rechnen Sie 9 plus 1.

Ähnliche Beiträge

Reverse Engineering a Dotnet Monitor

We reverse engineer a Dotnet Monitor in Windbg to see how it is internally implemented.

Weiterlesen …

Tune bcache for large SSDs

As SSDs are getting cheaper, low HDD / SSD ratios of 10/1 or better become an option. This article describes how to tune bcache for this scenario from an empirical perspective.

Weiterlesen …