5 min RTO with Aviatrix and Terraform

Disaster recovery involves a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.

The Recovery Time Objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.

I covered using Aviatrix to address the challenges of DR/BC before:

In this new blog I address a new set of requirements:

  • remote branches do not support multiple tunnels
  • remote branches overlapping IPs
  • Applications with hard coded IP (different App instances must run with the same IP address)

Proposed Design

The proposed solution has the following major points:

  • a set of resources is available only to the active region: vnet, gateways, ipsec tunnels, gateway attachments, and route propagation
  • a set of resources is in standby on the non-active region
  • Mapped NAT to overcome IP overlap
  • terraform is used to manually switch over the active and standby regions (thanks to Chris to the idea of using terraform state to take care of that. Also thanks to Dennis to always help me with terraform)

Terraform and Aviatrix Provider to the rescue

Site-2-Cloud terraform:

resource "aviatrix_site2cloud" "site2cloud_connection-east" {
depends_on = [
aviatrix_gateway.aviatrix_gateway_standalone-east
]
count = var.region_active == "east" ? 1 : 0
vpc_id = aviatrix_gateway.aviatrix_gateway_standalone-east.vpc_id
connection_name = "${aviatrix_gateway.aviatrix_gateway_standalone-east.id}-${var.region_active}-${replace("${var.remote_gateway_ip}", ".", "-")}"
connection_type = "mapped"
remote_gateway_type = "generic"
tunnel_type = "route"
pre_shared_key = var.pre_shared_key
enable_ikev2 = true
primary_cloud_gateway_name = aviatrix_gateway.aviatrix_gateway_standalone-east.gw_name
remote_gateway_ip = var.remote_gateway_ip
custom_mapped = false
remote_subnet_cidr = var.remote_subnet_cidr
remote_subnet_virtual = var.remote_virtual
local_subnet_cidr = aviatrix_vpc.azure_vnet_user-spoke-east-2.cidr
#local_subnet_virtual = var.cloud_virtual
enable_single_ip_ha = true
backup_gateway_name = aviatrix_gateway.aviatrix_gateway_standalone-east.peering_ha_gw_name
ha_enabled = true
backup_remote_gateway_ip = var.remote_gateway_ip
backup_pre_shared_key = var.pre_shared_key
}
resource "aviatrix_site2cloud" "site2cloud_connection-west" {
depends_on = [
aviatrix_gateway.aviatrix_gateway_standalone-west
]
count = var.region_active == "west" ? 1 : 0
vpc_id = aviatrix_gateway.aviatrix_gateway_standalone-west.vpc_id
connection_name = "${aviatrix_gateway.aviatrix_gateway_standalone-west.id}-${var.region_active}-${replace("${var.remote_gateway_ip}", ".", "-")}"
connection_type = "unmapped"
remote_gateway_type = "generic"
tunnel_type = "route"
pre_shared_key = var.pre_shared_key
enable_ikev2 = true
primary_cloud_gateway_name = aviatrix_gateway.aviatrix_gateway_standalone-west.gw_name
remote_gateway_ip = var.remote_gateway_ip
custom_mapped = false
remote_subnet_cidr = var.remote_subnet_cidr
#remote_subnet_virtual = var.remote_virtual
local_subnet_cidr = aviatrix_vpc.azure_vnet_user-spoke-west-2.cidr
#local_subnet_virtual = var.cloud_virtual
enable_single_ip_ha = true
backup_gateway_name = aviatrix_gateway.aviatrix_gateway_standalone-west.peering_ha_gw_name
ha_enabled = true
backup_remote_gateway_ip = var.remote_gateway_ip
backup_pre_shared_key = var.pre_shared_key
}
view raw site-2-cloud-tf hosted with ❤ by GitHub

The code above creates a Site-2-Site connection to an existing AVX gateway but it only creates on the active region using the expression count = var.region_active == “west” ? 1 : 0. The active region is determined by the value of the variable region_active declared in the terraform.tfvars.

The same principle is used to advertise the remote branch prefixes to the AVX fabric from the proper region using the included_advertised_spoke_routes variable:

module "vpn-spoke-west-2" {
source = "terraform-aviatrix-modules/mc-spoke/aviatrix"
version = "1.3.0"
account = var.account
cloud = var.cloud
region = var.region-a
cidr = cidrsubnet("${trimsuffix(var.cidr-region-a-1, "23")}16", 8, 2)
inspection = true
transit_gw = module.corp-west-2-transit.transit_gateway.gw_name
ha_gw = true
instance_size = var.instance_size
single_az_ha = false
az_support = false
name = "vpn-spoke-west-2-poc"
gw_name = "vpn-spoke-west-2-poc"
included_advertised_spoke_routes = var.region_active == "west" ? var.remote_subnet_virtual : null
}
view raw spokes.tf hosted with ❤ by GitHub

Because the applications requires the same ip addresses, only one vnet will be attached to the transit:

resource "aviatrix_azure_spoke_native_peering" "user-spoke-west-2" {
count = var.region_active == "west" ? 1 : 0
transit_gateway_name = module.user-west-2-transit.transit_gateway.gw_name
spoke_account_name = var.account
spoke_region = var.region-a
spoke_vpc_id = aviatrix_vpc.azure_vnet_user-spoke-west-2.vpc_id
}
view raw peering.tf hosted with ❤ by GitHub

If a need exists to switch over from one region to another, the fail over is as simple as change the value of region_active in the terraform.tfvars and run terraform apply. Terraform will “destory” the site-2-cloud connection on the active region, detached the workload vnet from the transit, and withdrawn the remote branch prefix from the vpn spoke gateway. Terraform will also create the new objects in the now active region.

References

https://en.wikipedia.org/wiki/Disaster_recovery

Leave a Reply