
Disaster recovery involves a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.
The Recovery Time Objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
I covered using Aviatrix to address the challenges of DR/BC before:
In this new blog I address a new set of requirements:
- remote branches do not support multiple tunnels
- remote branches overlapping IPs
- Applications with hard coded IP (different App instances must run with the same IP address)
Proposed Design
The proposed solution has the following major points:
- a set of resources is available only to the active region: vnet, gateways, ipsec tunnels, gateway attachments, and route propagation
- a set of resources is in standby on the non-active region
- Mapped NAT to overcome IP overlap
- terraform is used to manually switch over the active and standby regions (thanks to Chris to the idea of using terraform state to take care of that. Also thanks to Dennis to always help me with terraform)

Terraform and Aviatrix Provider to the rescue
Site-2-Cloud terraform:
The code above creates a Site-2-Site connection to an existing AVX gateway but it only creates on the active region using the expression count = var.region_active == “west” ? 1 : 0. The active region is determined by the value of the variable region_active declared in the terraform.tfvars.
The same principle is used to advertise the remote branch prefixes to the AVX fabric from the proper region using the included_advertised_spoke_routes variable:
Because the applications requires the same ip addresses, only one vnet will be attached to the transit:
If a need exists to switch over from one region to another, the fail over is as simple as change the value of region_active in the terraform.tfvars and run terraform apply. Terraform will “destory” the site-2-cloud connection on the active region, detached the workload vnet from the transit, and withdrawn the remote branch prefix from the vpn spoke gateway. Terraform will also create the new objects in the now active region.