Introduction: VMware Clustering and High Availability
Version: vSphere 5.5
The primary role of High Availability (HA) in vSphere environment is to restart VMs if a vSphere Host experiences as catastrophic failure. This could be caused by any number of issues such as power outage, and failure of multiple hardware components such that operation of the VM is impacted. VMware HA is part of number of “clustering” technologies including Distributed Resource Management (DRS) and Distributed Power Management – that intend to gather the individual resources of physical resources, and represent them as logical pool of resources that can be used to run virtual machines. Once the clustering technologies are enabled administrators a liberated from the constraints of the physical world, and the focus is less on the capabilities of an individual physical server, and more about the capacity and utilisation of the cluster. HA is not the only availability technology available – once enabled administrator have the option to enabled “Fault Tolerance” on selected VMs that benefit from its features. In order for FT to be enabled, so must HA.
In recent version of HA, more focus has been made on the availability of the VM generally – and so it is now possible to inspect the state of the VM itself, and to restart it – based on monitoring services within the guest operating system itself. The assumption being if core VMware services that run inside the Guest Operating system have stopped – this is likely to be good indication that the VM has serious issue, and end-users have already been disconnected.
In terms of configuration – VMware HA shares many of the same pre-requisites as VMware VMotion such as shared storage, access to consistently named networks and so on. As the VM is restarted there’s is no specific requirement for matching CPUs, although the reality is that because of vMotion and DRS this is often the case anyway.
Under the covers vSphere HA has a Master/Slave model where the first vSphere Host to join the cluster becomes the “master”. If the master becomes unavailable an election process is used to generate a new master. In simple configuration vSphere HA uses the concept of the “slot” to calculate the free resources available for new VMs to be created and join the cluster. The “slot” is calculated by working out the VMs size in terms of memory and CPU resources. When all the slots have been used, no more VMs can be powered on. The concept is used to stop a cluster becoming over-statuated with VMs, and stops the failure of one or more hosts from degrading overall performance, by allowing too many VMs to run on too few servers.
HA and Resource Management
If you loose a vSphere Host simultaneously the clusters has lost its contribution of CPU/Memory resources, and in the case of Virtual SAN – its contribution of storage as well. For this reason planning needs to conducted to work out was “reserve” of resources the cluster will have to accommodate failures. In more classically designs this can be express as N+1 or N+2 redundancy. Where we plan that the number of hosts required to deliver acceptable performances is N, and then we factor in additional hosts for either maintenance windows or failures. Related to this a concept of “Admission Control” which is the logic that either allows or denies power on events. As you might gather, it makes no sense in 32-node cluster, to attempt to power on VM when only one vSphere host is running. Admission control stops failures generating more failures, and decreasing the performance of the cluster, by allowing cascading failures effecting the whole cluster. For instance, if redundancy was set at +2 – VMware HA would allow two vSphere hosts to fail, and would restart VMs on the remaining nodes in the cluster. However, if a third vSphere host failed – the setting of +2 would stop VMs being restarted on the remaining hosts.
VMware HA as number of ways of expressing this reservation of resources for failover. It is possible to use classical +1, +2, and so on redundancy to indicate the tolerate loss of vSphere hosts and resources they provide. Additionally, its possible to break free from constraints of the physical world – and express this reservation in the form of percentage of CPU/Memory resources to be reserved to the failover process. Additionally, its possible to indicate a dedicated host that is use for failover – in classical active/standby approach.
Split-Brain and Isolation
Split-brain and Isolation are terms that both relate to how clustering systems work out that a failure has occurred. For example a host could be uncommunicable merely because the network that used to communicate from host-to-host in the cluster has a failure – typically this is the “Management” network address that resolves to the vSphere server FQDN name. For this reason its really a requirement of HA that the network have maximum redundancy to prevent split-brain from occurring – situation where the clustering system loses integrity and it becomes impossible to decide which systems are running acceptably or not. There are a couple of different ways of ensuring this which were covered earlier in the networking segments. However, a Standard Switch could be configured for two vmnics, and those vmnics (0 and 1) could be patched into different physical switches. This would guarantee that false failovers wouldn’t occur simply because of switch failure or network card failure. As with all redundancy a penny worth prevention is with a pound of cure – and its best to configure a HA cluster with maximum network redundancy to stop unwanted failovers occurring due to simple network outages.
With that said, HA does come with “isolation” settings which allow you to control what happens should network isolation take place. The HA agent does check external network devices such as routers to calculate if failure has taken place or if merely network isolation has occurred. VMware HA also checks to see if access to external storage is still valid. By these many checks the HA Agent can correctly work out if failure or network isolation has taken place. Finally, VMware HA has per-VM setting that control what happens should network isolation take place. By default network isolation is treated as if the host has physically stopped functioning – and VMs are restarted. However, using per-VM controls its possible to over-ride this behaviour if necessary. For the most part many customers don’t worry about these settings, as they have delivered plenty of network redundancy to the physical host.
Managing VM High-Availability
Creating a vSphere HA Cluster
Enabling VMware HA starts with creating a “cluster” in the datacenter that contains the vSphere hosts.
1. Right-click the Datacenter, and select New Cluster
2. In the name field type the name of the cluster. The name can reflect the purpose of the cluster for instance a cluster for virtual desktops. Increasingly, SySAdmins prefer to classify their cluster by their relative capabilities such as Gold, Silver, Bronze and so on. Additionally, clusters can be create with the sole purpose of running the vSphere infrastructure – companies often refer to these as “Management Clusters”. Those with experience generally turn on all the core vSphere clustering features including DRS and EVC.
3. Enable the option Turn On next to vSphere HA
Note: This dialog box only shows a subset of options available once the cluster has been created. For instance the full cluster settings allow for adjustments associated with the “slot” size of VM, as well the Active/Passive or Active/Standby optional configuration.
The option to Enable host monitoring is used to allow vSphere hosts to check each others state. This checks to see if a vSphere host is down or isolated from the network. The option can be temporarily turned off if its felt that network maintenance may contribute to false and unwanted failover. Enable Admission Control can be modified from using a simple count of vSphere hosts to achieve +1, +2 redundancy. Incidentally, this spinner can currently be only increased to a maximum of 31. Alternatively, the administrator can switch admission control to use a percentage to represent reservations of CPU/Memory allocated a reserve of resources held back to accommodate failover. Finally, Admission Control can be turned off entirely. This will allow failovers to carry on even when there’s insufficient resources to power on the VM and achieve acceptable performance. This isn’t really recommended, but maybe required in circumstance where a business critical application must be available, even if it offers degraded performance. In this situation the business is prepared to accept degraded service levels, rather than no service at all. In the ideal world, there should be plenty of resources to accommodate the lose of physical servers. VM Monitoring can be used to track the state of VMs. It can be turn on at entire cluster-level with certain VMs excluded as needed, or alternatively it can be enabled on per-VM basis.
Adding Multiple vSphere hosts to a HA Enabled Cluster
Once the cluster has been created vSphere hosts can be added by using drag-and-drop. However, you may find that using “Add Host” for new hosts that need to be joined to the cluster, or using “Move Hosts” for vSphere hosts that have already been added to vCenter.
If the Move Hosts option is used then multiple vSphere hosts can be added to the cluster. During this time the HA Agent is installed and enabled on each host – this can take sometime.
Once the cluster has been created the Summary screen will show basic details such as:
- Number of vSphere hosts
- Total/Used CPU/Memory
- Simple HA Configuration
- Cluster Consumers (Resource Pools, vApps and VMs)
Testing vSphere HA
There are number of different ways to test if vSphere is working correctly. By the far the most effective and realistic is to induce a failure of a physical vSphere host by powering it off. This can be done physically with the power button or by using the BMC/DRAC/ILO card. This test would require some powered on VMs. Powering off an vSphere host does not register immediately in the vCenter/Web Client UI as the management systems has number of retries to connect to the vSphere host in the event of temporary network outage. So for tests you may wish to carry out a ping -t of the vSphere host that will be brought down and number of the VMs that currently located on the host.
You can find out the IP address of given VM by viewing its “Summary” page
In the example below – a ping -t was made of esx03nyc.corp.com and the VM. Using the HP ILO interface esx03nyc.corp.com was forcibly and unceremoniously powered off.The older vSphere Client makes a better job of refreshing the management view to indicate the state of the vSphere host. You may need to refresh the Web Client in order to see these events.
It took it about 60seconds to generate a red alarm on the host, indicating there maybe an issue. It was a further 80seconds before the state of the vSphere host turned to “Not Responding”. This can also be indication of some network disconnect caused by a fault in the network. It was at 90seconds when the VMs that were running on the vSphere host were unregistered from it in vCenter, and instead registered to the other hosts in the cluster. Using a ping -t on a Windows 2012 R2 instance it was 180second when the operating system inside the VM began to respond to pings. In some cases you might prefer to use the command “ESXTOP” running on the hosts that are remaining in the cluster to watch the process of registering and power on in a more real-time fashion.
The after effects of a vSphere HA event are very much dependent on other clustering settings. For instance if DRS is enabled in a fully-automate mode, when the lost vSphere host is returned to the cluster, then VMs would be automagically vMotion’d to the host. If DRS is not enabled in this fashion the VMs remain running on the remaining hosts, until such time as the SysAdmin moves them manually or accepts a recommendation for them to be moved.
If the vSphere Web Client is refreshed then status information will display as can be seen below:
The health status of the cluster can be viewed from the monitor tab in the vSphere Web Client. This can be monitor the availability of the cluster – if network isolation has taken place – and also the total amount of “slots” available in the cluster as well.
Viewing and Modifying vSphere HA Settings
All the settings for vSphere HA can be found under the properties of the cluster >> Manage and the Edit button.
The host monitoring portion of the Edit Cluster Settings options control whether vSphere HA is turn on or off. As indicated earlier it is possible to turn off Host Monitoring. This controls whether the vSphere host share the network heartbeat that is used to calculate if the host is alive or dead. This can check can be temporarily turned off if you know that some network maintenance (such as physical switch or router upgrade) is likely to cause the network to be down for a period of time. The virtual machine options control what happens by default if there is a failure or isolation event. Two settings are available here, VM Restart Priority and Host Isolation Response. Restart Priority allows for four options – disabled, low, medium, high. By default is medium is selected, and all VMs would have the same restart priority of medium. It’s then possible under the VM Overrides options to add individual VMs, and indicate that some VMs have a low priority, or high priority – are started after or before any VMs with a medium priority. Alternatively, VMs can be excluded from the restart process altogether by using the VM Over-rides to disabled. This can be useful if you have non-critical VMs are that are not needed be available – and frees up resources for the more critical VMs. The Isolation Response controls what happens if a host becomes disconnected from the cluster due to some network outage or configuration error. In this case the isolated host may well be able to communicate to the router, but not the other hosts. Alternatively using what are called “datastore heartbeats” vSphere can work out that the host maybe disconnected from the network, but still connected to shared cluster resources. In such a case the host could still be running, and the VMs are unaffected. In this case the default policy would be to “Leave Powered On”. The alternatively, is assume a failure has occurred and either power of and restart, or shutdown the guest operating system, and restart on to the remaining hosts.
Admission Control Policy
The Admission Control Policy controls how resources are reserved to the clusters – and whether VMs are powered on or not based on the remaining resources left in the cluster. One policy allows for the definition of capacity by Static Number of Hosts. The spinner allows the SysAdmin to indicate how many host they feel they could comfortably lose but still maintain good quality of service. This spinner can now be taken as high 31. This is because the maximum number of vSphere host in a cluster is 32, which would allow logically for 31 failures leaving just one 1 host left over. As you might imagine its highly unlikely that one remaining node could take over from the lost of 31 servers. However, its more reasonable to suggest that in 32 node cluster that is 50% loaded, that a much higher number of physical servers could fail than the default of just 1.
By default the “slot” size is calculated based on the total maximum reservations used for CPU/Memory. A reservation is expressed on a VM or resource pool as guarantee of the given resources – rather than it being allocated on a on-demand basis. The idea of basing the “slot” size on these values is to try and guarantee that VMs are able to have their reservations allocated during power on. In some case this dynamically calculated “slot” size isn’t appropriate for customers – as it can be skewed by mix of very large and very small VMs. That can result in a either very large slot sizes which quickly reduces the number of VMs that can be powered on, or very small slot sizes which are quickly consumed by series of very large VMs. For this reason it is possible to modify the static number of hosts policy, by specifying a Fixed Slot Size expressed in CPU in Mhz and Memory in MB. Additionally, the Calculate button can be used to see which VMs could potentially require more than one slot. This can be used to verify if the fixed slot size is appropriately set. Once calculated the View link will show a list of VMs requiring more than one slot.
As alternatively to using a static number of hosts together with a slot size, vSphere provides the option to manage admission control by reserving a percentage of cluster resources. As you might gather this involves reserving an amount of CPU or Memory as proportion of the overall amount provided by the cluster. This can have a very similar effect to using a static number of hosts. For instance on three node cluster, if 33% was reserved for failover, this would be similar (but not the same as) to indicating +1 redundancy. This method dispenses with slot sizes altogether, and has proved to be a popular reconfiguration of vSphere Clustering.
SysAdmin are able to configure Dedicated Failover Hosts – in these case the specified hosts do not take a VM load, and held in reserve ready for vSphere host failure. Whilst this guarantees that the resource will be available. Many customers find this an expensive option and would prefer to allow there hosts to take some kind of load, but manage the overall load, with a reservation of resources.
Finally, Admission Control can be turned of by using Do not reserve capacity. This keeps vSphere HA running but doesn’t impose any restrictions on the whether a VM can be failover or power on manually. Occasionally.
VM Monitoring is sub-component of VMware HA, and is an optional feature. It can be used to inspect the state of virtual machines, and based on the result reboot them if they appear to have become unresponsive. The default is VM Monitoring is disabled, and some customer prefer this because they are anxious about vSphere ‘getting it wrong’ and unnecessarily rebooting VMs. This is because VM Monitoring inspects the VMware Tools “Heartbeat Service” and uses a successive lack of responses to determine if a VM is stalled or not. Significant work has been undertaken by VMware to lessen this concern – so in-conjunction with the heartbeat, VM Monitoring now inspect IO activity. The assumption is that if the heartbeat returns no answer AND no disk IO activity is taking place, there’s good likelihood that the VM has halted with either a kernel panic in Linux or Blue Screen of Death (BSOD) in Windows.
When VM Monitoring is enabled it comes with two options – VM Monitoring Only and VM and Application Monitoring. First monitors the VM heartbeat and restarts if no response is given within a specific time. VM and Application Monitoring checks for heartbeat signals from both VMware Tools as well as Applications/Services running within the guest operating system. This is called VMware AppHA and requires a virtual appliance to be configured, leverages VMware’s Hyperic software inside the guest operating system, and offers support to range of applications/services running in Enterprise environments. For simplicity we will cover “VM Monitoring” here, and cover VMware AppHA separately.
Monitoring Sensitivity comes with two options a preset value which allows you to indicate a Preset level of sensitivity. With high sensitivity this sets a sensitive policy policy that would only reset the VM if three have been three failures in 60mins, and checks counts a failure as no response within 30secs. As you move the slider bar from right to left VM Monitoring become increasingly conservative, and restarts are less likely to occur
High (2) – No response in 30sec, 3 failures per 60min
Medium (1) – No response in 60sec, 3 failures in 24hrs
Low (0)- No response in 2mins, 3 failures in 7days
If these settings are to aggressive or too conservative – then Custom setting allows the administrator control over these tolerances.
Along side using network heartbeats to evaluate the availability of a vSphere host, vSphere HA can also validate storage heartbeats to validate the connectivity. The assumption being if both network and storage heartbeats are both unavailable, then its highly liked the host has suffered a catastrophic failure. This can be regarded as increasing the condition required to initiate the restart of the VMs to another host, and another method of reducing the occurrence of split-brain. Datastore Heartbeating requires two or more datastores accessible to all the hosts in the cluster, and is a mandatory feature. Therefore if the hosts do not share datastores or if HA is enabled before the storage configuration has completed, this is likely to generate a warning on the vSphere hosts.
By default the vSphere HA Automatically select(s) datastores accessible to the host(s). In some case this selection may not reflect your preference. For instance if you work in a volatile lab environment where storage is temporarily mounted, VMs created, then destroyed and then unmounted. You may perfer to instead use datastores which know will always be mounted to the cluster. For this reason it’s possible to Use datastores from the specified list or else Use datastores from the specified list and complement automatically if needed. This last option feels like a good compromise between control, whilst at the same time protecting the environment from situations where the datastore maybe reconfigured or become unavailable.
In this case the policy was change to ensure that the “Software” and “Current-Templates” datastore locations were selected as the datastore heartbeat preference.
Advanced Options allows the administrator to supplement the vSphere HA configuration with additional parameters that control is functionality. A complete list of these options are available in VMwareKB Article 2033250. Typically, these settings are modified in environments that present unique requirements or demands generally around networking. These settings have been recently updated with the March, 2014 release of VMware Virtual SAN. vSphere HA can now leverage aspects of the Virtual SAN networking requirements as part of its internal logic.