https://www.sikich.com

Setting Virtual Machine Start Order with VMware vSphere 6.7 High Availability

Craig Schellenberg

May 11 2018

9 min read

The Problem

The world has exploded with private cloud and it has been the new normal since 2012. No longer are these the days of the long cycle of a business:

Recognizing the need for another server
Getting a quote from multiple providers for that server including the hours needed to install it
Going through the sales process of acquiring the server
Waiting for shipping for it to arrive
Waiting for resources to be available to install it (racking it, and plugging it in)
The work of installing the Operating System and updating it

Through virtualization, that process was changed and shortened considerably to:

Recognizing the need for another server
The work of installing the Operating System and updating it

From a potential 4-6 week window of time taken in the first 6 steps, this process had gone to just 2 steps with a 1-3 hours of time needed.

With this, companies can see much greater utilization of the server equipment purchased. No longer are servers sitting in racks with 15% or less of their resources being used simply because another server was needed.

However, once started down this path of consolidating workloads from multiple virtual servers onto one physical server, it should be realized that all the virtual eggs are in one basket. If that one server were to fail, all the workloads those servers were providing would be inaccessible to the users. This is very different from a previous life in IT where if one server to a fail, only its own workload would be inaccessible.

The Answer

The solution to this problem is the use of clustering. Industry leaders VMware and Microsoft both have solutions on how to handle this, and the VMware solution is being addressed in this blog post.

At the core, a computer exists as 4 functions:

CPU
Memory
Network
Storage

While a single server can provide all of these functions, the option of High Availability becomes available if the storage is isolated from the single server itself. This storage could exist as a centrally available appliance such as a SAN (Storage Area Network) or a NAS (Network Attached Storage). In this type of correctly sized and configured infrastructure, an entire physical server can fail, and users would only experience a few minutes of an outage versus days or weeks of an outage while the server that failed was repaired or replaced.

VMware pioneered this ability and is accomplished by turning on and configuring correctly VMware vSphere HA (High Availability). This article makes the following assumptions:

You have at least 2 servers set up in a functional HA cluster
You have N + 1 host server resources available in the HA cluster. For example:
1. If 2 servers, they are running at or below 50% capacity
2. If 3 servers, they are running at or below 66% capacity
3. If 4 servers, they are running at or below 75% capacity
In all the 3 examples above, there are enough resources remaining in the cluster where the virtual servers running on one failed machine can be powered back on and use the CPU, Memory, and Network from the remaining hosts in the cluster.

With HA turned on, by default, servers will automatically power back on on remaining hosts in the cluster if the host server they were previously running on were to fail. The event causing the outage of the server is called an HA Event. HA Events trigger procedures on remaining hosts in a cluster to power the VMs on on other hosts.

You can do some fine turning to this process, however, if you want the servers to power back on in a specific order. This is especially helpful if you happen to have had a power outage that caused all hosts and virtual machines to shut down, and nothing is powered on. In this example, you would likely want Domain Controllers to power on fully first, then the remaining servers in perhaps a specific order.

VMware just released version 6.7 of its hypervisor operating system (ESXi) and its management software (vCenter) and the remainder of this blog post will show how to do that.

The Procedure

Overview:

Verify HA is turned on.
Change the default settings for the restart condition to be “Guest Heartbeats Detected.” This means the machine powering on is in a fully loaded status.
Create groups of VMs that we want powered on together.
Create rules that will allow one group of VMs to be powered on before another group.
Make granular changes as needed.

Verify HA is turned on.

First log on to your vCenter server, right click your cluster and go to Settings. At a minimum for the cluster settings make sure you have protection against Host failure enabled. Verify there are no warning indicating issues on why HA may not function.

VMware High Availability

Change the default setting for the restart condition to “Guest Heartbeats Detected.”

It is a good idea to click the EDIT button on the vSphere Availability section and change the default VM dependency restart condition. By default, the setting is to start the next group when the resources for the machine to start are allocated. That literally just takes a few moments and the next group will start likely before you want them to. Ideally you would want machines fully powered on, Operating System loaded and operational before the next series of virtual machines were to start.

NOTE: This can be accomplished if VMware Tools is installed on the virtual machine and the VM dependency restart condition is set to Guest Heartbeats detected as shown in the following screen shot.

VMware High Availability

Create groups of VMs that we want powered on together.

From this same section click on “VM/Host Groups” within the Configuration Section. In the below screenshot I’ve opted to create 6 groups and put a “# – “ prefix in front of them in the order I would have them start. NOTE: This prefix is optional and purely for management reasons. The VM Group name is not editable after you create it, so make it right the first time!

In this example, I have planned to have the Domain Controllers start first with the vCenter server next. After this I have the File Share Servers followed by the ERP servers followed by the SQL Servers and finally everything else that is remaining. NOTE: This will change depending on your environment. Rely on your own experience on determining the startup order for your environment.

VMware High Availability

Create rules that will allow one group of VMs to be powered on before another group.

Again, in the same section click on the “Create VM/Host Rule” under Configuration and click Add…

In this example rule we continue the “# – 1” prefix on all names for management sake. We select the type of “Virtual Machines to Virtual Machines” and select our dependent VMs group first (we want these powered on all the way first) and then select the next VM group to power on (vCenter in this instance).

VMware High Availability

Continue adding all the rules until you have addressed all VM Groups. The individual VMs in each group will show when a rule is selected to give the warm fuzzy that the rule was set up correctly as you wanted it to be.

VMware High Availability

Make granular changes as needed.

Now you may be thinking your environment is different and you don’t need a specific server to be booted up and completely available before starting to boot your next VM Group. This is addressed in the VM Overrides section.

VM Overrides allow the ability of more granularity in the start up. This is the configuration screen for adding a VM Override.

VMware High Availability

Note the granularity here where we can add additional parameters per virtual machine in start up. One such parameter is VM Restart Priority. vCenter allows for 6 separate groupings of start order via the VM Restart Restart Priority setting. If a VM is overridden from the default setting they will start in the following order:

Highest
High
Medium
Low
Lowest
Disabled (will not power on automatically in an HA event)

The default setting is Medium in which all servers (if no Override is defined) will start up automatically in no specific order on an HA event or in a specific order if the VM Groups and VM Rules are defined.

While not needed it is a good idea to add overrides for the Restart Priority as this is a checks and balance against the VM Rules assigned in the previous section. It will make sure you set machines the way you intended.

To add this, click “VM Overrides” in the Configuration section and then click the Add… button.

All available virtual machines in the cluster will show here with the option to add an Override for them.

VMware High Availability

In this environment, one virtual server does not have VMware Tools installed on it and I need to override when to be able to start the next priority VMs. Recall we changed the default settings for startup conditions to be whenever the VMware heartbeats are detected from the previous group. If VMware Tools is not installed that condition would never be met. Change it to “Resources allocated” and now we won’t worry about not having VMware Tools installed.

VMware High Availability

Remember, if none of the above configuration is done on your environment outside of enabling HA, your VMs will power back on in an HA Event. This potentially may add a need to remediate issues as when the servers were starting and services were starting resources on other servers were not available yet. A way to address that potential is to use VM Groups, VM Rules, and VM Overrides. Putting a little work upfront will prevent such chase conditions in the event of a HA event.

This publication contains general information only and Sikich is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or any other professional advice or services. This publication is not a substitute for such professional advice or services, nor should you use it as a basis for any decision, action or omission that may affect you or your business. Before making any decision, taking any action or omitting an action that may affect you or your business, you should consult a qualified professional advisor. In addition, this publication may contain certain content generated by an artificial intelligence (AI) language model. You acknowledge that Sikich shall not be responsible for any loss sustained by you or any person who relies on this publication.

About the Author

Craig Schellenberg

Craig Schellenberg is a Senior Network Consultant at Sikich that works with businesses to improve their IT. Being detail oriented assists in his ability to design and deploy new solutions as well as troubleshoot complex issues. His primary areas of focus are virtualization and storage on premise (whether through VMware vSphere or Microsoft Hyper-V), Microsoft Cloud services such as Azure and Office 365, Microsoft SQL design and administration, backup/DR/Business Continuance, and network route/switch/firewalls. Craig holds many certifications including his MCSE (Microsoft Certified Solutions Expert) in Productivity, Messaging, and Cloud Platform and Infrastructure. Craig also holds multiple certifications of his VCP (VMware Certified Professional) including version 3, 4 (Data Center Virtualization), 5 (Data Center Virtualization), 5 (Desktop), Cloud, and 6 (Data Center Virtualization).