Planning & Preparation for Site Protection and Disaster Recovery with VMware Cloud Foundation

Its been a while, but we are finally welcoming the disaster recovery solution back into the family in the form of the Site Protection and Disaster Recovery Validated Solution for VMware Cloud Foundation.

It reintroduces Site Recovery Manager and vSphere Replication in combination with NSX-T Federation to allow recovery of SDDC Management components such as vRealize Automation, vRealize Operations Manager, vRealize Suite Lifecycle Manager and Clustered Workspace ONE Access to a VMware Cloud Foundation (VCF) instance in another location.

There will be several posts released in the coming days by my colleagues and I (such as Brian O’Connell’s post and Gary Blake’s post), which will cover the detail of the solution, so to avoid overlap and in keeping with my last few posts, I’m going to focus on how this solution is addressed in the Planning & Preparation Workbook for VMware Cloud Foundation (P&P)

Enabling Site Protection

It took a whole lot of work, across the components products and by the team that produces the VMware Validated Solutions to add this simple one line item to P&P

Deployment Options in VCF Planning & Preparation Workbook
Site Protection & Disaster Recovery Sub-Options

This new drop down provides you several options:

  • Exclude: If you don’t want this VCF instance to participate in any Site Protection and Disaster Recovery relationship with another instance
  • Management Only: If you want this instance to be either the protected or recovery site for management components.
  • VI Workload Only: If you want this instance to be either the protected or recovery site for workloads in a VI Workload Domain.
  • Management & VI Workload: If you want this instance to be either the protected or recovery site for management components as well as workloads in a VI Workload Domain.

Dependencies

There are different dependencies for Management and VI Workload Domains

Management Domain Dependencies

  • NSX Federation (which in turn depends on Overlay-Backed NSX Routing)
  • Initial Deployment of one of both of the following
    • Private Cloud Automation (vRealize Automation)
    • Intelligent Operations Manager (vRealize Operations Manager)

Without at least one of these initial deployments there would be no need to protect vRealize Suite Lifecycle Manager or even to deploy Clustered Workspace ONE Access, and therefore there would be nothing to actually protect.

Them being an ‘Initial Deployment‘ is also key because if you are doing a ‘Connect Instance’ then you are not actually deploying the components that need to be protected in this VCF instance, but rather simply connecting to them in instance where they already exist.

VI Workload Domain Dependencies

  • Just the presence of a VI Workload Domain

There are no further requirement as its unknown to us what the profile of those workloads are, how you want to the network them etc.

Setting up the Planning & Preparation Workbooks

As you know, there is just one file now for Planning & Preparation since VCF 4.3.0 was introduced (see here). You use a copy of the file for every instance of VCF that you want to deploy. The choices you make inside the workbook determine the profile of the VCF instance you are deploying. This concept is now extended to include the Site Recovery Manager concepts of the Protected and Recovery sites.

Asking you as the end user to specify the Protected site as an explicit choice in the workbook doesn’t make sense as its so dependent on the other options selected in the file. Instead, whether the VCF instance is to be the Protected or Recovery site is a function of the other choices you make within the file. This ensures that all the dots align correctly for your deployment.

So this is how it works:

A VCF instance becomes the ‘Protected Site’ for the Management Domain when you do the following:

  1. Set the ‘Multi-instance integration model’ to be ‘First Domain’
    • This is where the initial deployments of the SDDC management components to be protected are done, and therefore it makes sense for there to be a correlation between First Domain and the Protected site concept.
  2. Choose the ‘Initial Deployment’ deployment option for at least one of the following:
    • Private Cloud Automation for VMware Cloud Foundation
    • Private Cloud Automation for VMware Cloud Foundation
  3. Choose either ‘Management Only’ or ‘Management and VI Workload’ of the following from the ‘Site Protection and Disaster Recovery for VMware Cloud Foundation’ deployment option

If you do that (and resolve any other dependencies) then you will see something like this:

Protected Site Configured

A VCF instance becomes the ‘Recovery Site’ for the Management Domain when you do the following:

  1. Set the ‘Multi-instance integration model’ to be ‘Join Domain’ or ‘Additional Domain’
    • This indicates that it is not the instance where initial deployments of components typically happen, but that this instance will be connected in some fashion to the First Domain
  2. Choose either ‘Exclude’ or ‘Connect Instance‘ for Private Cloud Automation or Private Cloud Automation. As long as they are not set to Initial Deployment you are good.
  3. Choose either ‘Management Only’ or ‘Management and VI Workload’ of the following from the ‘Site Protection and Disaster Recovery for VMware Cloud Foundation’ deployment option

And that will give you something like this:

Recovery Site Configured

A similar concept applies to the VI Workload Domains i.e.

  • First Domain + VI Workload Domain results in Protected Site
  • Join or Additional Domain + VI Workload Domain results in Recovery Site

What happens if you get the combination wrong? Planning & Prep tells you why:

Wrong Combination Example 1
Wrong Combination Example 2

Using the Workbooks During Deployment

  • The majority of the tasks for configuring the Recovery instance are actually carried out leveraging the Planning & Preparation workbook for the Protected Instance.
    • Planning an SDDC deployment across two instances requires a lot of inputs, and the last thing we want to do is force a customer to enter the same input more than once. Therefore we leverage data already gathered in the P&P for the protected instance, to avoid duplicating data entry into the P&P for the Recovery instance.
    • There are tasks (such as the PowerShell based Load Balancer Service creation on the Recovery instance) that actively interrogate the Protected instance in order to mirror settings to the Recovery instance. So we need the protected VCF instance details anyway.
  • The only time the P&P workbook for the Recovery instance is referenced is for the deployment and configuration of Site Recovery Manager and vSphere Replication on that instance as these elements are naturally very intertwined with the network and infrastructure related details that pertain to the Recovery VCF instance.

Here’s a table that shows you the deployment flow (Instance A is assumed to be the Protected Instance and Instance B is assumed to be Recovery Instance)

StepTaskVCF InstanceP&P Workbook
1Create a Virtual Machine Folder and Move the vRealize Suite Lifecycle Manager Virtual Machine in the Protected VMware Cloud Foundation InstanceAA
2Create Virtual Machine Folders for SDDC Management Components in the Recovery VMware Cloud Foundation InstanceAA
3Prepare Load Balancing Services for the vRealize Suite Components and the Clustered Workspace ONE Access Instance in the Recovery VMware Cloud Foundation InstanceAA
4Reconfigure the DNS, Domain Search, and NTP Settings on Multiple SDDC Management ComponentsAA
5Implementation of vSphere Replication for the Management DomainAA
6Implementation of Site Recovery Manager for the Management DomainAA
7Implementation of vSphere Replication for the Management DomainBB
8Implementation of Site Recovery Manager for the Management DomainBB
9Configure Failover of the SDDC Management Components for the Management DomainAA

Notice that Steps 5-6 and 7-8 are the same tasks, but repeated once per VCF instance. All other tasks are performed just once, and make most sense to be done using detail already provided for the initial deployments of the SDDC Management components in Instance A (ie. the First Domain). Hence the use of the Planning & Prep workbook from instance A.

Deployment Tasks

In line with the table above, you will see that tasks to be carried out using the P&P for the Protected instance vs the P&P for the Recovery instance are masked/unmasked accordingly.

Sample Task View from P&P for Protected Instance
Sample Task View from P&P for Recovery Instance

Obviously all the red in the top screenshot indicates that there are required inputs that havent been provided – which is true as the screenshot is taken from a freshly downloaded copy of the workbook. The key point though is that the tasks are only unmasked if they need to be performed from this copy of the workbook.

Summary

Thats all for now. Hope this was useful.

As always please feel free to comment / ask questions!

2 thoughts on “Planning & Preparation for Site Protection and Disaster Recovery with VMware Cloud Foundation

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: