A Day in the life of a Multi Site packet on a Cisco ACI Hybrid Multi Cloud Environment (Azure)
The overall high-level architecture of Cloud ACI with Nexus Dashboard Orchestrator (NDO) acting as a central policy controller, managing policies across multiple on-premises ACI data centers as well as hybrid environments, with each cloud site being abstracted by its own Cloud APICs is show on the diagram below.
For more details on the Cisco ACI multi-site solution, refer to the following white paper: https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739609.pdf
The on-prem ACI fabric, Azure Cloud ACI, Nexus Dashboard and Nexus Dashboard Orchestrator deployment and configuration are not discussed in this document.
A Cisco ACI fabric, composed of two gen-2 leaves and one gen-3 spine, running version 5.2 is deployed in a data center and connects through a layer 3 inter-site network to Azure where a Cisco Cloud ACI version 25.0(1c) is deployed in the westus region. A VM is attached to the fabric using VMM (126.96.36.199) and an instance is deployed in a tenant vnet on azure for end-to-end data plane connectivity tests (10.10.1.4). NDO version 3.4 is used to stretch tenant, vrf, and epgs across sites.
A tenant called spoke01 is created and associated to sites 2 and azure. Site1 and aws are out of the scope of this document.
A single schema with a single template is created containing a single VRF, ANP, and EPG:
Site local configurations are properly configured. Epg01 has an ep-selector matching a subnet (10.10.1.0/24):
A gateway is created for the on-prem bd (188.8.131.52/24):
A VMM domain is associated to the epg01 on the on-prem site:
The configuration is saved and then deployed. The green check marks next to the site locals show the configuration was applied successfully.
On-prem VM (184.108.40.206) should be capable of reaching out to the VM running on azure (10.10.1.4):
The on-premises ACI sites and the ACI cloud site in Azure are connected through an IP network that can be IPsec VPN over the internet, or through Azure ExpressRoute. In case of IPsec VPN, the CSR 1000V routers in the Infra VNet need to be programmed to establish IPsec peering with the IPsec device located on premises (at the physical ACI site). This underlay network provides IP reachability for the overlay control plane and data plane between the two sites.
IPsec is optional if Express Route is used with Cloud ACI >= 25 and NDO 3.4.
Figure 5 The underlay network between on-premises and cloud sites
Leaf (on-prem -> cloud)
ACI learns the MAC (and IP) address as a local endpoint when a packet comes into a leaf switch from its front-panel ports.
leafXXX# show system internal epm endpoint ip <on-prem vm ip address>
This output tells us the BD-VNID, VRF-VNID (2686983), and pcTag/sclass are among other useful information.
The leaf is responsible for reporting its local endpoints to the Council of Oracle Protocol (COOP) database, located on each spine switch:
spineXXX# show coop internal info ip-db key <VRF ID> <on-prem vm ip address>
Leaf (cloud -> on-prem)
On the leaf, a route entry pointing to the vnet CIDR is created, and it has as next hop the CSR TEP address leaked from overlay-1:
- Interface GigE 4 private ip address:
X.X.X.52 for ct_routerp_<region>_0:0
X.X.X.116 for ct_routerp_<region>_1:0
leafXXX# show ip route <azure vm ip address> vrf <user vrf name>
If a route is learned through the internal fabric iBGP process and running vsh -c “show ip route x.x.x.x detail vrf <name>” shows a non-zero rw-vnid value, then the route is being learned from an l3out in another vrf:
leafXXX# vsh -c “show ip route <azure vm ip address> vrf <vrf name>”
BGP origin AS 65534 and vnid 0x270001 (2555905) belongs to the azure CSRs. On the leaf, CSRs tep are learned from the spines through IS-IS.
leafXXX# show ip route <csr tep address> vrf overlay-1
leafXXX# show isis route vrf overlay-1
The next hop is the spine TEP address:
Spine (cloud -> on-prem)
On the spine, CSR tep addresses are learned from the IPN (ospf) and then redistributed into is-is. The prefix leaked are controlled by the Fabric External Routing Profile (l3extSubnet) under the Fabric Ext Connection Police (l3extFabricExtRoutingP).
spineXXX# vsh -c “show isis vrf overlay-1”
The redistribution is controlled by the static policy (interleak_rtmap_infra_prefix_ext_static_routes):
The prefixes permitted are the TEP addresses of the remote sites:
spineXXX# show ip route vrf overlay-1
The next hop is the ISN:
Spine (on-prem -> cloud)
The end point <on-prem VM IP address> learnt from leafXXX is published to spineXXX via COOP by the EPM (Endpoint manager). COOP process from the spine then injects this EP to l2vpn EVPN via the bgp.PfxLeakP object which has the “skip-rib-check” flag:
spineXXX# moquery -c bgp.PfxLeakP
We can use the command below to check if the VM ip was properly injected into the l2vpn evpn:
spineXXX# show bgp l2vpn evpn vrf overlay-1 | egrep "Route Dis|<on-prem vm ip address>\]"
The different APIC domains are interconnected through a generic Layer 3 infrastructure, generically called the Intersite Network (ISN). The ISN requires plain IP-routing support to allow the establishment of site-to-site VXLAN tunnels. This requirement means that the ISN can be built in an arbitrary way, ranging from a simple, single router device (two are always recommended, for redundancy) to a more complex network infrastructure spanning the world.
The ISN could be a network infrastructure dedicated to multisite, but it is quite common in many real-life scenarios to use, for this purpose, a network that is also providing other connectivity services. In this latter case, it is strongly recommended to deploy a dedicated VRF routed domain to be used for the forwarding of multisite control plane and data plane traffic.
The IPN routers learn TEP addresses from the fabric through OSPF and distributes the CSR TEP address to the fabric:
ipn# show ip route <vrf>
CSRs learn the on-prem infra TEP from OSPF when running ipsec (pay attention to the interface used for ospf adjacency):
csr# show ip route ospf
A static route is automatically added to the user VNET CIDR pointing to the subnet gateway of the CSR interface facing azure (GigE 2):
The user vnet is connected to the infra vnet using vnet peering. Default system routes to the peered vnet CIDR is created on both routing tables having as next hop “Virtual Network”.
The overlay network between the on-premises and cloud sites runs BGP EVPN as its control plane and uses VXLAN encapsulation and tunneling as its data plane. The use of VXLAN is to identify the right routing domain for VRF stretch across on-premises ACI fabric and the clouds.
BGP EVPN sessions are established between the on-premises ACI spine switches and the CSR 1000V Series cloud routers in the Infra VNet of the cloud site. Tenant host routes and prefix routes are exchanged between the two sites as BGP EVPN route type-2 (host) and type-5 (prefix). The provisioning of this overlay network connectivity is automated by NDO.
BGP l2vpn evpn is used to exchange routes between the on-prem spines and the cloud routers.
csr# show bgp l2vpn evpn summary
The CSR receives the on-prem VM route and advertise the user vnet to the spine (pod-1 220.127.116.11 and pod-2 18.104.22.168):
The next hop 10.1.0.34 is the spine dtep address and 172.29.32.1 is the azure subnet gateway address of the interface GigE 2.
A User Defined Route (UDR) is added to the tenant route table pointing to the LB that front ends the CSR:
Spine receives the vnet tenant CIDR route through bgp l2vpn:
spineXXX# show bgp l2vpn evpn <azure vm ip address> vrf overlay-1
Path-id 1 is not advertised to any peers because it is re-originated into vpnv4 and advertised to internal leaves:
spineXXX# vsh -c “show bgp vpnv4 unicast <azure vm ip address> vrf overlay-1"
10.1.104.64 and 10.1.104.65 are leaves to which the on-prem VM is attached to trough a vpc. Notice that the route-target has been re-originated in the format of “ASN: VRF Seg_ID”. This is the route-target that the leaf will import.
Leaves are RR clients and learn the route to azure vm through bgp vpnv4 from the spine on overlay-1:
spineXXX# show bgp vpnv4 unicast vrf <user vrf>
The data plane uses vxlan to carry packets across sites:
- On-prem outbound connections are encapsulated at the leaf and de-encapsulated at the CSR.
- On-prem inbound connections are encapsulated at the CSR and de-encapsulated at the spine and then re-encapsulated at the spine and finally de-encapsulated at the leaf.
Tunnels are type fabric-ext, learn-disabled, non-fabric-gold, non-fabric-untrusted, and physical. They can be queried using the MO tunnelIf.
Shadow contracts are created and applied as provider and consumer to the on-prem epg and to a dummy user tenant l3out. The prefix “ — msc” is used for that purpose.
An inbound rule is created to allow all traffic from the on-prem epg subnet (22.214.171.124/24):
An outbound rule is created to allow all traffic from the application security group that represents the cloud epg to on-prem:
Security rules entries names are prepended by “msc” prefix.
csr# show crypto ikev2 sa
csr# debug crypto ikev2
csr# show crypto ipsec sa
csr# debug crypto ipsec
Routes received from a neighbor
spineXXX# show bgp l2vpn evpn neighbors <neighbor ip address> routes vrf overlay-1
Routes advertised to a neighbor
spineXXX# show bgp l2vpn evpn neighbors <neighbor ip address> advertised-routes vrf overlay-1
spineXXX# show bgp internal limited-vrf
VRF, EVI, and route-map
spineXXX# show bgp internal network vrf <user vrf>
linux# tcpdump host <on-prem VM ip or cloud instance ip>
csr# show debugging
csr# debug platform condition ipv4 <on-prem VM ip> both
csr# debug platform condition start
csr# debug platform packet-trace packet 1024
csr# show platform packet-trace summary
csr# show platform packet-trace packet <packet number>
Enter this command to clear the trace buffer and reset packet-trace:
csr# clear platform packet-trace statistics
The command to clear both platform conditions and the packet trace configuration is:
csr# clear platform conditions all
Leaf and Spine
Embedded Logic Analyzer Module (ELAM) can be used to capture packets on the leafs or on the spines. A user friendly App is also available.