Home

  • Carlos, The Cloud Architect

    Carlos

    Overview

    Carlos the Architect implements a multi-agent Software Development Lifecycle (SDLC) for cloud infrastructure design. The system uses 11 specialized AI agents orchestrated through LangGraph to automate the complete journey from requirements gathering to production-ready Terraform code, with historical learning from past deployment feedback.

    ┌───────────────────────────────────────────────────────────────────────────────────┐
    │ AGENTIC SDLC PIPELINE │
    ├───────────────────────────────────────────────────────────────────────────────────┤
    │ │
    │ REQUIREMENTS ──► LEARNING ──► DESIGN ──► ANALYSIS ──► REVIEW ──► DECISION ──► CODE │
    │ │ │ │ │ │ │ │ │
    │ [Gathering] [Historical] [Carlos] [Security] [Auditor] [Recommender] [TF] │
    │ │ [Learning] [Ronei] [Cost] │ │ │ │
    │ │ │ ║ [SRE] │ │ │ │
    │ ▼ ▼ ▼ ▼ ▼ ▼ ▼ │
    │ Questions Context 2 Designs 3 Reports Approval Selection IaC │
    │ from feedback │
    └───────────────────────────────────────────────────────────────────────────────────┘

    SDLC Phases Mapped to Agents

    SDLC PhaseAgent(s)OutputPurpose
    1. RequirementsRequirements GatheringClarifying questionsUnderstand user needs
    2. LearningHistorical LearningContext from past designsLearn from deployment feedback
    3. DesignCarlos + Ronei (parallel)2 architecture designsCompetitive design generation
    4. AnalysisSecurity, Cost, SRE (parallel)3 specialist reportsMulti-dimensional review
    5. ReviewChief AuditorApproval decisionQuality gate
    6. DecisionDesign RecommenderFinal recommendationSelect best design
    7. ImplementationTerraform CoderInfrastructure-as-CodeProduction-ready output

    Agent Architecture

    The 11 Agents

    ┌─────────────────────────────────────────────────────────────────┐
    │ AGENT HIERARCHY │
    ├─────────────────────────────────────────────────────────────────┤
    │ │
    │ TIER 1: PRIMARY ARCHITECTS (GPT-4o) │
    │ ┌─────────────────┐ ┌─────────────────┐ │
    │ │ CARLOS │ │ RONEI │ │
    │ │ Conservative │ vs │ Innovative │ │
    │ │ AWS-native │ │ Kubernetes │ │
    │ │ temp: 0.7 │ │ temp: 0.9 │ │
    │ └─────────────────┘ └─────────────────┘ │
    │ ▲ ▲ │
    │ └───────┬─────────────┘ │
    │ │ │
    │ TIER 0.5: HISTORICAL LEARNING (No LLM - Data Query) │
    │ ┌─────────────────────────────────────────┐ │
    │ │ Historical Learning │ │
    │ │ (Queries Cosmos DB for past feedback) │ │
    │ └─────────────────────────────────────────┘ │
    │ │
    │ TIER 2: SPECIALIST ANALYSTS (GPT-4o-mini) │
    │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
    │ │ Security │ │ Cost │ │ SRE │ │
    │ │ Analyst │ │ Analyst │ │ Engineer │ │
    │ └──────────┘ └──────────┘ └──────────┘ │
    │ │
    │ TIER 3: DECISION MAKERS (GPT-4o) │
    │ ┌──────────┐ ┌───────────┐ ┌───────────┐ │
    │ │ Auditor │ │Recommender│ │ Terraform │ │
    │ │ Chief │ │ Design │ │ Coder │ │
    │ └──────────┘ └───────────┘ └───────────┘ │
    │ │
    │ TIER 0: REQUIREMENTS (GPT-4o-mini) │
    │ ┌───────────────────┐ │
    │ │ Requirements │ │
    │ │ Gathering │ │
    │ └───────────────────┘ │
    │ │
    └─────────────────────────────────────────────────────────────────┘

    Agent Details

    1. Requirements Gathering Agent

    • Model: GPT-4o-mini (cost-optimized)
    • Role: Initial clarification of user needs
    • Output: 3-5 clarifying questions about:
      • Workload characteristics (traffic, data volume, users)
      • Performance requirements (latency, throughput, SLAs)
      • Security & compliance needs
      • Budget constraints
      • Deployment preferences

    1.5 Historical Learning Node

    • Model: None (data query only)
    • Role: Learn from past deployment feedback
    • Data Source: Azure Cosmos DB (deployment feedback)
    • Process:
      1. Extract keywords from refined requirements
      2. Query similar past designs from feedback store
      3. Categorize feedback by success (4-5 stars) vs problems (1-2 stars)
      4. Extract patterns that worked well
      5. Extract warnings from problematic deployments
    • Output: Formatted context injected into design prompts
    • Graceful Degradation: Returns empty context on failure (5s timeout)

    2. Carlos (Lead Cloud Architect)

    • Model: GPT-4o (main pool)
    • Temperature: 0.7 (balanced)
    • Personality: Pragmatic, conservative, dog-themed
    • Focus: AWS-native managed services, proven patterns, simplicity
    • Output: Complete architecture design with Mermaid diagram
    • Philosophy: “If it ain’t broke, don’t fix it”

    3. Ronei (Rival Architect – “The Cat”)

    • Model: GPT-4o (ronei pool)
    • Temperature: 0.9 (more creative)
    • Personality: Cutting-edge, competitive, cat-themed
    • Focus: Kubernetes, microservices, serverless, service mesh
    • Output: Alternative architecture design with Mermaid diagram
    • Philosophy: “Innovation drives excellence”

    4. Security Analyst

    • Model: GPT-4o-mini
    • Focus Areas:
      • Network exposure & segmentation
      • Identity & access management
      • Data encryption (transit + rest)
      • Logging & monitoring
      • Incident response readiness

    5. Cost Optimization Specialist

    • Model: GPT-4o-mini
    • Focus Areas:
      • Major cost drivers identification
      • Reserved instances / savings plans
      • Spot/preemptible instance opportunities
      • Storage lifecycle & archival
      • FinOps best practices

    6. Site Reliability Engineer (SRE

    • Model: GPT-4o-mini
    • Focus Areas:
      • Failure scenarios & blast radius
      • Capacity planning & auto-scaling
      • Observability (metrics, logs, traces)
      • Health checks & alerting
      • Operational runbooks

    7. Chief Architecture Auditor

    • Model: GPT-4o (main pool)
    • Role: Final quality gate
    • Decision: APPROVED or NEEDS REVISION
    • Output: Executive summary with strengths and required changes

    8. Design Recommender

    • Model: GPT-4o (main pool)
    • Role: Select the winning design
    • Decision: Must choose exactly one (Carlos OR Ronei)
    • Output: Recommendation with justification and tradeoffs

    9. Terraform Coder

    • Model: GPT-4o (main pool)
    • Role: Generate production-ready infrastructure-as-code
    • Output:
      • main.tf – Resource definitions
      • variables.tf – Input variables
      • outputs.tf – Output values
      • versions.tf – Provider configuration
      • Deployment instructions

    Workflow Graph

    LangGraph State Machine

                                  START
                                    │
                                    ▼
                        ┌───────────────────────┐
                        │  Has User Answers?    │
                        └───────────────────────┘
                               │         │
                              NO        YES
                               │         │
                               ▼         │
                  ┌────────────────────┐ │
                  │   Requirements     │ │
                  │    Gathering       │ │
                  └────────────────────┘ │
                               │         │
                               ▼         │
                  ┌────────────────────┐ │
                  │ Clarification      │ │
                  │ Needed?            │ │
                  └────────────────────┘ │
                        │         │      │
                       YES       NO      │
                        │         │      │
                        ▼         ▼      ▼
                      END    ┌─────────────────┐
                (wait for    │     Refine      │
                 answers)    │  Requirements   │
                             └─────────────────┘
                                      │
                                      ▼
                             ┌─────────────────┐
                             │   HISTORICAL    │
                             │    LEARNING     │
                             │ (query feedback)│
                             └─────────────────┘
                                      │
                        ┌─────────────┴─────────────┐
                        │                           │
                        ▼                           ▼
               ┌──────────────┐            ┌──────────────┐
               │    CARLOS    │            │    RONEI     │
               │   (design)   │  PARALLEL  │   (design)   │
               │ +historical  │            │ +historical  │
               │   context    │            │   context    │
               └──────────────┘            └──────────────┘
                        │                           │
                        └─────────────┬─────────────┘
                                      │
                  ┌───────────────────┼───────────────────┐
                  │                   │                   │
                  ▼                   ▼                   ▼
           ┌────────────┐      ┌────────────┐      ┌────────────┐
           │  SECURITY  │      │    COST    │      │    SRE     │
           │  ANALYST   │      │  ANALYST   │      │  ENGINEER  │
           └────────────┘      └────────────┘      └────────────┘
                  │                   │                   │
                  └───────────────────┼───────────────────┘
                                      │
                                      ▼
                             ┌──────────────┐
                             │   AUDITOR    │
                             │   (review)   │
                             └──────────────┘
                                      │
                        ┌─────────────┴─────────────┐
                        │                           │
                   APPROVED                   NEEDS REVISION
                        │                           │
                        ▼                           │
               ┌──────────────┐                     │
               │ RECOMMENDER  │                     │
               │  (decision)  │                     │
               └──────────────┘                     │
                        │                           │
                        ▼                           │
               ┌──────────────┐                     │
               │  TERRAFORM   │◄────────────────────┘
               │    CODER     │      (revision loop)
               └──────────────┘
                        │
                        ▼
                       END
    
    

    https://github.com/rtrentin73/carlos-the-architect

  • Pyr Edge: Anomaly Detection and AI-assisted operations

    Pyr Edge: Anomaly Detection and AI-assisted operations

    Pyr-Edge

    Pyr-Edge ingests VPC flow logs from AWS, Azure, and GCP, providing real-time analysis, anomaly detection, and natural language querying capabilities through an intuitive web interface.

    https://github.com/rtrentin73/pyr-edge

  • kubectl-ai

    kubectl-ai

    What it is

    kubectl-ai acts as an intelligent interface, translating user intent into precise Kubernetes operations, making Kubernetes management more accessible and efficient.

    How to install

    curl -sSL https://raw.githubusercontent.com/GoogleCloudPlatform/kubectl-ai/main/install.sh | bash

    Gemini API Key

    Go to https://aistudio.google.com/ then Get API Keys:

    Depending on the tier you will need to import a Google Cloud Project for billing purposes.

    Testing

    A simple test to validate the configuration. I asked kubectl-ai to list k8s clusters i have access:

    Costs

    https://ai.google.dev/gemini-api/docs/pricing

    References

    https://github.com/GoogleCloudPlatform/kubectl-ai?tab=readme-ov-file

  • Deploying and Operating a (GKE) K8S using GitOps (Flux)

    Deploying and Operating a (GKE) K8S using GitOps (Flux)
    man riding on yellow forklift
    Photo by ELEVATE on Pexels.com

    Summary

    k8sfluxops is a GitOps repository that manages a complete Kubernetes infrastructure on GKE using Flux v2.

    https://github.com/rtrentinavx/k8sfluxops

    It demonstrates a production-grade setup with:

    🎯 Core Purpose

    Declarative, Git-driven management of Kubernetes infrastructure where all changes are version-controlled and automatically reconciled by Flux.

    📦 What It Deploys

    CategoryComponents
    IngressTraefik (routes / → nginx, /boutique/ → Online Boutique)
    ObservabilityGrafana, Jaeger, OpenTelemetry Collector, Hubble UI, Kube-ops-view
    Policy/SecurityOPA Gatekeeper with 4 constraint templates, Policy Manager UI
    Cost ManagementKubecost
    BackupVelero with GCS backend + UI
    Cluster MgmtRancher, Weave GitOps dashboard
    Demo AppsOnline Boutique (10 microservices with OTel tracing), Nginx
    AutoscalingHPA for Online Boutique & Nginx, VPA recommendations

    🔄 GitOps Flow

    Git Push → Flux detects change → Reconciles to cluster → Apps/Infra updated

    🏗️ GKE Features Used

    Dataplane V2 (Cilium), Node Auto-Provisioning, VPA, Workload Identity, gVNIC, Managed Prometheus.

    This repo essentially serves as a reference architecture for running a fully-featured, observable, and policy-enforced Kubernetes platform using GitOps principles.

    References

    https://github.com/rtrentinavx/k8sfluxops

  • FastConnect Tip

    Using AS_PATH to Prefer Routes from Oracle to the On-premises Network

    Oracle uses the shortest AS path when sending traffic to the on-premises network, regardless of which path was used to start the connection to Oracle.Therefore asymmetric routing is allowed. Asymmetric routing here means that Oracle’s response to a request can follow a different path than the request.

    Oracle implements AS path prepending to establish preference on which path to use if the edge device advertises the same route and routing attributes over several different connection types between the on-premises network and VCN.

    Oracle honors the complete AS path you send.

    Reference

    https://docs.oracle.com/en-us/iaas/Content/Network/Concepts/fastconnectoverview.htm

  • Building a Cloud Backbone

    Building a Cloud Backbone

    This architecture establishes a cloud backbone connecting AWS, Azure, and GCP, with AWS Transit Gateway (TGW), Azure Virtual WAN (vWAN), and GCP Network Connectivity Center (NCC) serving as northbound components to manage connectivity within each cloud, while Aviatrix Transit Gateways form the backbone for inter-cloud connectivity, ensuring seamless traffic flow across the clouds. Southbound connectivity links on-premises environments to each cloud using dedicated circuits, specifically AWS Direct Connect, Azure ExpressRoute, and GCP Cloud Interconnect, enabling secure and high-performance access to cloud resources.

    AWS Transit Gateway

    • What it is: It provides a single gateway to interconnect VPCs within the same AWS account, across different accounts, and with on-premises networks via VPN or AWS Direct Connect. This reduces the need for complex peering relationships (e.g., VPC peering in a mesh topology) and simplifies network management.
    • How It Works: Transit Gateway acts as a regional router. You attach VPCs, VPNs, or Direct Connect gateways to the Transit Gateway, and it routes traffic between them based on a centralized route table. Each attachment (e.g., a VPC or VPN) can be associated with a specific route table to control traffic flow.

    Azure Virtual WAN (vWAN)

    • What It Is: Azure Virtual WAN (vWAN) is a managed networking service that provides a centralized hub-and-spoke architecture to connect Azure Virtual Networks (VNets), on-premises networks, branch offices, and remote users. It simplifies large-scale network management by offering a unified solution for connectivity, routing, and security across Azure regions and hybrid environments.
    • How It Works: vWAN creates virtual hubs in Azure regions, each acting as a central point for connectivity. VNets, VPNs (site-to-site, point-to-site), and ExpressRoute circuits are attached to these hubs. vWAN integrates with Azure Route Server to enable dynamic routing via Border Gateway Protocol (BGP). Azure Route Server, deployed in a dedicated subnet, peers with network virtual appliances (NVAs) like firewalls, ExpressRoute, and VPN gateways, learning and propagating routes to VNets and VMs. vWAN supports any-to-any connectivity in its Standard tier, allowing traffic to flow between VNets, branches, and on-premises networks through the hub, with options for security and traffic optimization using Azure Firewall or third-party NVAs.

    GCP Network Connectivity Center (NCC)

    • What It Is: Google Cloud’s Network Connectivity Center (NCC) is a hub-and-spoke networking service that simplifies connectivity between Google Cloud Virtual Private Cloud (VPC) networks, on-premises networks, and other cloud providers. It provides a centralized way to manage hybrid and multi-cloud connectivity, reducing the complexity of manual route configuration.
    • How It Works: NCC operates as a global hub with spokes representing different network resources, such as VPCs, Cloud VPN tunnels, Cloud Interconnect attachments, or third-party networks (e.g., via Megaport Virtual Edge). It uses Google Cloud Router for dynamic routing via BGP. Cloud Router, a regional service within a VPC, establishes BGP sessions with external routers (e.g., on-premises routers or other cloud routers), learns routes, and programs them into the VPC’s routing table.

    AWS Transit Gateway, Azure Virtual WAN, and Google Cloud Network Connectivity Center Comparison

    Feature/AspectAWS Transit GatewayAzure Virtual WANGoogle Cloud Network Connectivity Center (NCC)
    PurposeCentral hub for connecting VPCs, on-premises networks, and AWS services in a hub-and-spoke model.Enables global transit network architecture with hub-and-spoke connectivity for VNs, branches, and users.Centralized hub for hybrid and multi-cloud connectivity, connecting VPCs, on-premises networks, and other clouds.
    ScopeBroad hub-and-spoke solution for VPC-to-VPC, hybrid, and inter-region connectivity.Hub-and-spoke model for VN-to-VN, hybrid, and global connectivity using Microsoft’s backbone.Hub-and-spoke model focused on hybrid and multi-cloud connectivity, less on intra-VPC routing.
    Layer of OperationLayer 3 (Network Layer)Layer 3 (Network Layer)Layer 3 (Network Layer)
    Dynamic RoutingSupports BGP for dynamic routing between VPCs, VPNs, and Direct Connect.Supports BGP for dynamic routing with ExpressRoute, VPNs, and SD-WAN devices.Supports BGP for dynamic routing with Interconnect, VPNs, and third-party routers.
    Hybrid ConnectivityIntegrates with AWS Direct Connect and Site-to-Site VPN for on-premises connectivity.Integrates with ExpressRoute and VPN; supports branch-to-branch and VN-to-VN transit.Integrates with Cloud Interconnect and Cloud VPN; supports hybrid and multi-cloud setups.
    Inter-Region SupportNative inter-region peering between Transit Gateways for global connectivity.Hub-to-hub connectivity in a full mesh for global transit across regions.Uses a global Spoke-Hub-Spoke model; supports cross-region connectivity via hubs.
    ScalabilitySupports thousands of VPCs; up to 50 Gbps throughput per Transit Gateway.Scales with hub infrastructure units (up to 2000 VMs per hub); hub-to-hub full mesh.Scales with hub-and-spoke model; no specific throughput limit, but depends on attachments.
    High AvailabilityBuilt-in redundancy within a region; supports multiple Transit Gateways for failover.Full mesh hub-to-hub connectivity; supports ExpressRoute Global Reach for redundancy.Managed service with regional redundancy; supports HA with multiple attachments.
    NVA/SD-WAN IntegrationSupports Transit Gateway Connect for SD-WAN (GRE tunnels, up to 20 Gbps).Native integration with SD-WAN vendors (e.g., Cisco, Aruba); supports NVAs in hubs.Supports third-party routers and SD-WAN via Router Appliance spokes; less native integration.
    IPv6 SupportSupports IPv6 for VPCs, VPNs, and Direct Connect.Supports IPv6 for ExpressRoute and VPN, but not in all configurations.Supports IPv6 for Interconnect and VPN, but limited in some multi-cloud scenarios.
    Use Cases– Multi-VPC connectivity
    – Hybrid cloud setups
    – Centralized security VPCs
    – Global applications
    – Global transit for branches and VNs
    – Hybrid connectivity with ExpressRoute/VPN
    – SD-WAN integration
    – Multi-region hub-and-spoke
    – Hybrid and multi-cloud connectivity
    – Centralized management of external connections
    – Multi-region VPC connectivity
    Limitations– Regional service; inter-region peering adds latency/cost
    – Data transfer fees ($0.02/GB)
    – Hub-to-hub latency for cross-region traffic
    – Limited to 2000 VMs per hub without scaling units
    – Complex routing configuration
    – Less mature than TGW/vWAN
    – Limited intra-VPC routing features
    – Egress costs for Interconnect/VPN
    Cost$0.02/GB for data processed; additional fees for attachments and inter-region peering.Costs for hub deployment, data processing, and ExpressRoute/VPN usage (not specified).Free for NCC; costs for Interconnect ($0.02–$0.10/GB egress) and VPN data transfer.

    AWS Deployment

    The Terraform configuration sets up a multi-cloud transit network in AWS using Aviatrix modules, integrating Aviatrix Transit Gateways with AWS Transit Gateways (TGW) via GRE tunnels and BGP. It also can deploy and bootstrap PAN firewalls.

    https://github.com/rtrentinavx/bb/tree/main/control/8/aws2.1

    Azure Deployment

    The provided Terraform code deployes an Aviatrix transit architecture on Azure. It includes data sources, local variables, and resources to manage transit gateways, BGP over LAN (bgpolan), vWAN creation and integration, and spoke gateways. It also can deploy and bootstrap PAN firewalls.

    https://github.com/rtrentinavx/bb/tree/main/control/8/azure2.1

    GCP Deployment

    The provided Terraform code deployes an Aviatrix transit architecture on Google Cloud Platform (GCP). It includes data sources, local variables, and resources to manage transit gateways, BGP over LAN (bgpolan), Network Connectivity Center (NCC) creation and integration, and spoke gateways. It also can deploy and bootstrap PAN firewalls.

    https://github.com/rtrentinavx/bb/tree/main/control/8/gcp2.1

    References

    https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html

    https://learn.microsoft.com/en-us/azure/virtual-wan/virtual-wan-about

    https://cloud.google.com/network-connectivity-center

  • “Mastering” K8S

    “Mastering” K8S
    close up photo of boat wheel
    Photo by Rachel Claire on Pexels.com

    The repository contains Terraform scripts designed to create Google Kubernetes Engine (GKE) and Azure Kubernetes Service (AKS) clusters. These setups are fully customizable through input parameter files. Additionally, the scripts provision the necessary network infrastructure and bastions, ensuring secure access to the clusters.

    https://github.com/rtrentinavx/kubernetes

    References

    https://cloud.google.com/kubernetes-engine

    https://azure.microsoft.com/en-us/products/kubernetes-service

  • Cisco C8000v Autonomous IPSEC Configuration

    Recommended

    crypto ikev2 proposal IKEV2-PROP 
     encryption aes-gcm-256
     prf sha512
     group 19
    !
    crypto ikev2 policy IKEV2-POLICY 
     proposal IKEV2-PROP
    !
    crypto ikev2 keyring IKEV2-KEYRING
     peer SITEB
      address 162.43.158.29
      pre-shared-key local Str0ngSecret!
      pre-shared-key remote Str0ngSecret!
     !
    !
    crypto ikev2 profile IKEV2-PROFILE
     match identity remote address 162.43.158.29 255.255.255.255 
     identity local address 162.43.158.41
     authentication remote pre-share
     authentication local pre-share
     keyring local IKEV2-KEYRING
     dpd 10 5 on-demand
    !
    crypto ipsec transform-set TS-ESP-GCM esp-gcm 256 
     mode tunnel
    !
    crypto ipsec profile IPSEC-PROFILE
     set transform-set TS-ESP-GCM 
     set ikev2-profile IKEV2-PROFILE
    !         
    interface Tunnel10
     description IKEv2 VTI to 162.43.158.29
     ip address 10.10.10.1 255.255.255.252
     tunnel source GigabitEthernet1
     tunnel mode ipsec ipv4
     tunnel destination 162.43.158.29
     tunnel protection ipsec profile IPSEC-PROFILE
    !

    For environments where GCM is not supported:

    
    crypto ikev2 proposal oracle_v2_proposal 
     encryption aes-cbc-256
     integrity sha384
     group 14
    !
    crypto ikev2 policy oracle_v2_policy 
     proposal oracle_v2_proposal
    !
    crypto ikev2 keyring oracle_keyring_tunnel1
     peer oracle_vpn
      address 129.146.159.116
      pre-shared-key local <pre-shared key>
      pre-shared-key remote <pre-shared key>
     !      
    !
    crypto ikev2 profile oracle_ike_profile_tunnel1
     match identity remote address 129.146.159.116 255.255.255.255 
     identity local address 162.43.155.57
     authentication remote pre-share
     authentication local pre-share
     keyring local oracle_keyring_tunnel1
    !
    crypto ipsec transform-set oracle-vpn-transform esp-aes 256 esp-sha-hmac 
     mode tunnel
    !
    crypto ipsec profile oracle_ipsec_profile_tunnel1
     set transform-set oracle-vpn-transform 
     set pfs group14
     set ikev2-profile oracle_ike_profile_tunnel1
    !
    interface Tunnel100
     ip address 169.254.0.1 255.255.255.252
     tunnel source 162.43.155.57
     tunnel mode ipsec ipv4
     tunnel destination 129.146.159.116
     tunnel protection ipsec profile oracle_ipsec_profile_tunnel1
    !

    IKEv2/IPSec Algorithm Cheat Sheet

    Phase 1 – IKEv2 (Control Channel)

    Purpose: Establish a secure, authenticated channel for negotiating IPsec.

    CategoryAlgorithm OptionsExplanation
    EncryptionAES-CBC-128 / AES-CBC-256AES in CBC mode; strong encryption but needs separate integrity (HMAC).
    AES-GCM-128 / AES-GCM-256AES in Galois/Counter Mode; provides encryption + integrity (AEAD).
    PRFSHA1Legacy; avoid for new deployments.
    SHA256Recommended minimum; widely supported.
    SHA384 / SHA512Stronger hash for high-security environments; more CPU cost.
    Diffie-HellmanGroup 14 (MODP 2048-bit)Classic DH; secure but slower than elliptic curve.
    Group 19 (ECDH P-256)Elliptic Curve DH; fast and secure; best practice for modern VPNs.
    Group 20 (ECDH P-384)Higher security elliptic curve; more CPU cost.

    Phase 2 – IPsec (Data Channel)

    Purpose: Encrypt and protect actual traffic between sites.

    CategoryAlgorithm OptionsExplanation
    Encryption (ESP)AES-CBC-128 / AES-CBC-256Encrypts payload; requires separate integrity algorithm (HMAC).
    AES-GCM-128 / AES-GCM-256Encrypts + authenticates in one step; preferred for performance/security.
    Integrity (ESP)HMAC-SHA1Adds integrity/authentication; legacy, avoid for new deployments.
    HMAC-SHA256 / SHA384 / SHA512Strong integrity checks; used with AES-CBC (not needed with GCM).
    ModeTunnelEncrypts entire original IP packet; standard for site-to-site VPNs.
    TransportEncrypts only payload; used for host-to-host or GRE over IPsec.
    PFSSame DH group as Phase 1Adds extra DH exchange for forward secrecy; recommended for high security.

    Why GCM is Faster

    • AES-GCM uses Counter mode (CTR) for encryption and Galois field multiplication for authentication, which can be parallelized.
    • AES-CBC encrypts blocks sequentially (each block depends on the previous), so it cannot be parallelized for encryption.
    • GCM also eliminates the need for a separate integrity algorithm (HMAC), reducing overhead.

    Why ECDH is Faster

    1. Smaller Key Sizes for Same Security
      • Classic DH (MODP) needs very large primes (e.g., 2048-bit for Group 14) to achieve strong security.
      • ECDH achieves equivalent security with much smaller keys (e.g., 256-bit for Group 19).
      • Smaller keys mean less data to exchange and fewer operations.
    2. Efficient Mathematical Operations
      • MODP DH uses modular exponentiation, which is computationally expensive and grows with key size.
      • ECDH uses elliptic curve point multiplication, which is much more efficient for the same security level.
    3. Lower CPU Cycles
      • Modular exponentiation involves repeated multiplications and reductions on large integers.
      • Elliptic curve operations are optimized and require fewer CPU cycles, especially with hardware acceleration.
    4. Better Hardware Support
      • Modern CPUs and crypto libraries often include optimized routines for elliptic curve math.
      • ECDH benefits from acceleration (e.g., Intel AES-NI and ECC instructions), while MODP DH gains less.

  • Centralized Ingress using FortiGate NGFWs and Aviatrix Transit Gateways with FireNet enabled

    Centralized Ingress using FortiGate NGFWs and Aviatrix Transit Gateways with FireNet enabled

    High Level Design

    Ingress Design using the Aviatrix FireNet FortiGates:

    All FortiGates receive sessions via the load balancer as long as they pass the health checks. While an Active-Passive (A-P) cluster behind the load balancer is an option, it is generally more effective to use standalone FGT units behind the load balancer in multiple Availability Zones (AZs). This configuration provides a robust mechanism to withstand the complete failure of an AZ.

    For reference, i have attached the Aviatrix Transit Firewall Network design for FortiGate firewalls below:

    • Port 1 is the port/interface facing the internet (unstrusted)
    • Port 2 is the port/interface facing the Aviatrix Transit Gateways (trusted)

    The application flow is show below:

    Aviatrix Transit Configuration

    Enable Firenet

    Navigate to CoPilot -> Security -> Firenet and click on the +Firenet button. Select the transit gateway you want to enable the feature:

    Click Add.

    FortiGate Deployment

    In the AWS Marketplace portal, search for Fortinet FortiGate Next-Generation Firewall and Accept the terms for Fortinet FortiGate Next-Generation Firewall on AWS Marketplace:

    Go back to CoPilot and deploy a pair of firewalls:

    Web Server

    For testing purposes, I created a spoke where i deployed an EC2 instance running NGINX on port 80 (10.208.132.45):

    Load Balancer

    Create a target group to expose the web server running on port 80:

    VPC and Health checks:

    Pick up the firewalls, select port 80, and register as pending:

    Health checks will fail for now as we still need to configure the firewall.

    Create an Application Load Balancer (ALB) to expose HTTP and HTTPS based applications. Application Load Balancer offers advanced features as WAF and other integrations. If you have non-HTTP/HTTPS applications, deploy a Network Load Balancer (NLB).

    Possible ALB integrations:

    The ALB/NLB is Internet Facing:

    We will deploy it in the same vpc where the Aviatrix Transit Gateways were deployed:

    We will drop the load balancer interfaces into the “-Public-FW-ingress-egress-” subnets across two different zones.

    FortiGate Configuration

    Check if Source/Destination is disabled on the EC2 Instance:

    Access the management GUI and create a VIP for the web server:

    • External IP address is the FortiGate private interface IP address facing the Load Balancers (port 1)
    • Map to IPv4 address/range is the web server private ip address

    Create a firewall policy to allow the traffic:

    • Incoming Interface: port 1
    • Outgoing Interface: port 2
    • Source: all (it can be fine tune to the ALB private addresses (recommended approach))
    • Destination: VIP address
    • Service: HTTP

    Testing

    Identify the ALB/NLB DNS name from the ALB/NLB properties:

    Using a browner access it:

    Curl it from a terminal:

    Exposing More Applications

    The rule of exposing more applications is to be capable of properly redirecting traffic to the backend by doing DNAT in the firewall:

    There are multiple ways of accomplishing it and I’m going to cover the options below.

    Host-Based Routing

    • Add target groups for each application (e.g., TargetGroup-App1, TargetGroup-App2).
    • Configure listener rules to route traffic based on the host header:
      • Example: Host: app1.example.com → TargetGroup-App1.
      • Example: Host: app2.example.com → TargetGroup-App2.
    • Ensure DNS records (e.g., CNAME) point app1.example.com and app2.example.com to the ALB’s DNS name.

    Path-Based Routing

    Route traffic to different applications based on the URL path (e.g., example.com/app1, example.com/app2).

    • Create separate target groups for each application.
    • Configure listener rules to route traffic based on the path pattern:
      • Example: Path: /app1/* → TargetGroup-App1.
      • Example: Path: /app2/* → TargetGroup-App2.

    Combination of Host- and Path-Based Routing

    Combine host and path rules for more complex routing (e.g., app1.example.com/api → App1, app2.example.com/api → App2).

    • Use listener rules that combine host header and path pattern conditions.
    • Example: Host: app1.example.com AND Path: /api/* → TargetGroup-App1.

    Multiple Ports or Protocols

    • Route traffic based on different listener ports
      • services are different

    Using Multiple Load Balancers

    • Deploy separate ALBs or NLBs for each application.
      • source IPs are different

    Packet Capture on FortiGate NGFWs

    diagnose sniffer packet port1 "port 80"

    Bucket Policy

    If you need/want to enable logging for the ALB/NLBs you will need to create a S3 bucket and a bucket policy to allow the LBs to write into it:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "logdelivery.elasticloadbalancing.amazonaws.com"
                },
                "Action": "s3:PutObject",
                "Resource": "arn:aws:s3:::<bucket name>/AWSLogs/<account number>/*",
                "Condition": {
                    "StringEquals": {
                        "s3:x-amz-acl": "bucket-owner-full-control"
                    }
                }
            }
        ]
    }

    Reference

    https://docs.aviatrix.com/documentation/latest/security/fortigate-ingress-protection-firenet.html

  • Using Cloud Interconnect with Aviatrix

    Using Cloud Interconnect with Aviatrix

    Google Cloud Interconnect is a service provided by Google Cloud Platform (GCP) that enables customers to establish private, high-performance connections between their on-premises infrastructure and Google Cloud. It offers low-latency, secure connectivity by bypassing the public internet, making it ideal for scenarios like data migration, replication, disaster recovery, or hybrid cloud deployments. There are three main options:

    1. Dedicated Interconnect: Provides a direct physical connection between your on-premises network and Google’s network, offering high bandwidth (10 Gbps to 200 Gbps) for large-scale data transfers.
    2. Partner Interconnect: Connects your on-premises network to Google Cloud through a supported service provider, suitable for lower bandwidth needs (50 Mbps to 10 Gbps) or when a direct connection isn’t feasible.
    3. Cross-Cloud Interconnect: Links your network in another cloud provider (e.g., AWS, Azure) directly to Google’s network.

    Key benefits include reduced latency, enhanced security (traffic stays off the public internet), cost savings on egress traffic, and direct access to Google Cloud’s internal IP addresses without needing VPNs or NAT devices. It’s widely used by enterprises in industries like media, healthcare, and global operations for reliable, scalable cloud connectivity.

    Prerequisites

    • Ensure you have a Megaport account and a physical Port (e.g., 1 Gbps, 10 Gbps, or 100 Gbps) provisioned in a Megaport-enabled data center. If not, order one via the Megaport Portal.
    • Confirm you have a Google Cloud project with a Virtual Private Cloud (VPC) network set up. In this design, we are using the Aviatrix “transit” vpc.
    • Create at least one Cloud Router per Aviatrix Transit VPC
    • Identify the Google Cloud region where you want to connect (must align with a Megaport Point of Presence).
    • Decide on the bandwidth for your Virtual Cross Connect (VXC)—Megaport supports 50 Mbps to 10 Gbps for Partner Interconnect.

    Create a Partner Interconnect Attachment in Google Cloud

    Navigate to Network Connectivity > Interconnect from the main menu:

    Click Create VLAN Attachments:

    Select Partner Interconnect, then click Continue.

    Choose I already have a service provider (Megaport in this case).

    Configure the Attachment:

    • Resiliency (single or redundant VLANs)
    • Network (Aviatrix Transit VPC)
    • MTU (it must align with the Aviatrix Transit VPC mtu create before)
    • VLAN A Cloud Router
    • VLAN B Cloud Router

    Generate Pairing Key:

    • After creating the attachment, Google will provide a pairing key (a UUID-like string, e.g., 123e4567-e89b-12d3-a456-426614174000/us-central1/1).
    • Copy this key—you’ll need it in the Megaport Portal.

      Provision a Virtual Cross Connect (VXC) in Megaport

      Go to the Megaport Portal (portal.megaport.com):

        In this example, we will create a MVE (Cisco Catalyst 8000v) and connect it to the Aviatrix Gateway using BGPoIPSec over the cloud partner interconnect.

        On the Megaport click on Portal Services -> Create MVE:

        Select the region and then the Vendor/Product:

        Add additional interfaces to the MVE to align to the design:

        Click the + Connection and Choose Cloud as the connection type.

        Select Google Cloud as the Provider:

        Paste the pairing key from Google Cloud into the provided field. Megaport will automatically populate the target Google Cloud location based on the key:

        Configure VXC Details:

          1. Name: Give your VXC a name (e.g., gcp-vxc-1).
          2. Rate Limit: Set the bandwidth to match the capacity chosen in Google Cloud (e.g., 1000 Mbps for 1 Gbps).
          3. A-End VNIC: This is the interface from your VM where you are attaching the connection.
          4. Preferred A-End VLAN: Specify a VLAN ID if required, or let Megaport auto-assign it.

          Deploy the VXC:

          • Add the VXC to your cart, proceed to checkout, and deploy it.
          • Deployment typically takes a few minutes.

            A second connection is required for the redundant vlan attachement. The steps are exactly the same.

            Activate the Attachment in Google Cloud

            Return to Google Cloud Console and check the attachment status:

            Activate the Attachment:

              Configure BGP

              Set Up BGP in Google Cloud. In the attachment details, click Edit BGP Session:

                Peer ASN: Enter your on-premises router’s ASN (private ASN, e.g., 64512–65534). Google’s ASN is always 16550.

                Note the BGP IP addresses assigned by Google (e.g., 169.254.1.1/29 for Google, 169.254.1.2/29 for your side).

                Configure the Megaport MVE with the information generated above.

                Verify Connectivity

                1. Check BGP Status:
                  • In Google Cloud Console, under the attachment details, confirm the BGP session is Established.

                This connection is what we can underlay and the only prefixes exchanged should be the Megaport C8000v and Aviatrix Transit Gateways IPs.

                The MVE router configuration is under the Cisco C8000v Configuration session.

                Configure Aviatrix

                We create an External Connection attaching over Private Network from CoPilot:

                The connectivity diagram for this solution looks like the following mermaid diagram:

                The IPSec on the right it really goes on top of the cloud interconnect but my mermaid stills are not up to the task :).

                Checking the status and prefixes exchanged:

                From Megaport MVE, we see 3 bgp neighbors: 1 x underlay (Cloud Partner Interconnect VLAN Attachment) and 2 x overlay (Aviatrix):

                megaport-mve-103456#show ip bgp summary 
                BGP router identifier 169.254.214.2, local AS number 65501
                BGP table version is 32, main routing table version 32
                15 network entries using 3720 bytes of memory
                23 path entries using 3128 bytes of memory
                8 multipath network entries and 16 multipath paths
                4/4 BGP path/bestpath attribute entries using 1184 bytes of memory
                2 BGP AS-PATH entries using 64 bytes of memory
                0 BGP route-map cache entries using 0 bytes of memory
                0 BGP filter-list cache entries using 0 bytes of memory
                BGP using 8096 total bytes of memory
                BGP activity 17/2 prefixes, 27/4 paths, scan interval 60 secs
                16 networks peaked at 17:48:39 Apr 2 2025 UTC (4d20h ago)
                
                Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
                169.254.10.1    4        65502    7220    7953       32    0    0 5d00h           8
                169.254.10.5    4        65502    7220    7935       32    0    0 5d00h           8
                169.254.214.1   4        16550   24152   26085       32    0    0 5d14h           1

                The output below shows a few routes learned from the overlay:

                megaport-mve-103456#show ip bgp 
                BGP table version is 32, local router ID is 169.254.214.2
                Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
                              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
                              x best-external, a additional-path, c RIB-compressed, 
                              t secondary path, L long-lived-stale,
                Origin codes: i - IGP, e - EGP, ? - incomplete
                RPKI validation codes: V valid, I invalid, N Not found
                
                     Network          Next Hop            Metric LocPrf Weight Path
                 *m   10.0.2.0/24      169.254.10.1                           0 65502 64512 ?
                 *>                    169.254.10.5                           0 65502 64512 ?
                 *m   100.64.0.0/21    169.254.10.1                           0 65502 64512 ?
                 *>                    169.254.10.5                           0 65502 64512 ?
                 *m   100.64.8.0/21    169.254.10.1                           0 65502 64512 ?
                 *>                    169.254.10.5                           0 65502 64512 ?

                Cisco C8000v Configuration

                The default username for the NVE admin is mveadmin

                interface GigabitEthernet2
                 ip address 169.254.214.2 255.255.255.248
                 mtu 1460
                 no shutdown
                !
                interface Loopback0
                 ip address 192.168.255.1 255.255.255.255
                 no shutdown
                !
                interface Tunnel11
                 ip address 169.254.10.2 255.255.255.252
                 ip mtu 1436
                 ip tcp adjust-mss 1387
                 tunnel source Loopback0
                 tunnel destination 192.168.5.3
                 tunnel mode ipsec ipv4
                 tunnel protection ipsec profile AVX-IPSEC-5.3
                 no shutdown
                !
                interface Tunnel12
                 ip address 169.254.10.6 255.255.255.252
                 ip mtu 1436
                 ip tcp adjust-mss 1387
                 tunnel source Loopback0
                 tunnel destination 192.168.5.4
                 tunnel mode ipsec ipv4
                 tunnel protection ipsec profile AVX-IPSEC-5.4
                 no shutdown
                !
                crypto ikev2 proposal AVX-PROPOSAL
                 encryption aes-cbc-256
                 integrity sha256
                 group 14
                !
                crypto ikev2 policy AVX-POLICY
                 proposal AVX-PROPOSAL
                !
                crypto ikev2 keyring AVX-KEYRING-5.3
                 peer AVX-PEER-5.3
                  address 192.168.5.3
                  pre-shared-key Avtx2019!
                 !
                !
                crypto ikev2 keyring AVX-KEYRING-5.4
                 peer AVX-PEER-5.4
                  address 192.168.5.4
                  pre-shared-key Avtx2019!
                 !
                !
                crypto ikev2 profile AVX-PROFILE-5.3
                 match identity remote address 192.168.5.3 255.255.255.255
                 identity local address 192.168.255.1
                 authentication local pre-share
                 authentication remote pre-share
                 keyring local AVX-KEYRING-5.3
                 lifetime 28800
                 dpd 10 3 periodic
                !
                crypto ikev2 profile AVX-PROFILE-5.4
                 match identity remote address 192.168.5.4 255.255.255.255
                 identity local address 192.168.255.1
                 authentication local pre-share
                 authentication remote pre-share
                 keyring local AVX-KEYRING-5.4
                 lifetime 28800
                 dpd 10 3 periodic
                !
                crypto ipsec transform-set AVX-TS-5.3 esp-aes 256 esp-sha256-hmac
                 mode tunnel
                !
                crypto ipsec transform-set AVX-TS-5.4 esp-aes 256 esp-sha256-hmac
                 mode tunnel
                !
                crypto ipsec profile AVX-IPSEC-5.3
                 set security-association lifetime seconds 3600
                 set transform-set AVX-TS-5.3
                 set pfs group14
                 set ikev2-profile AVX-PROFILE-5.3
                !
                crypto ipsec profile AVX-IPSEC-5.4
                 set security-association lifetime seconds 3600
                 set transform-set AVX-TS-5.4
                 set pfs group14
                 set ikev2-profile AVX-PROFILE-5.4
                !
                router bgp 65501
                 bgp log-neighbor-changes
                 neighbor 169.254.214.1 remote-as 16550
                 neighbor 169.254.214.1 update-source GigabitEthernet2
                 neighbor 169.254.214.1 timers 20 60
                 neighbor 169.254.10.1 remote-as 65502
                 neighbor 169.254.10.1 update-source Tunnel11
                 neighbor 169.254.10.1 timers 60 180
                 neighbor 169.254.10.5 remote-as 65502
                 neighbor 169.254.10.5 update-source Tunnel12
                 neighbor 169.254.10.5 timers 60 180
                 address-family ipv4
                  network 172.16.5.0 mask 255.255.255.0
                  network 192.168.255.1 mask 255.255.255.255
                  redistribute connected
                  neighbor 169.254.214.1 activate
                  neighbor 169.254.10.1 activate
                  neighbor 169.254.10.1 soft-reconfiguration inbound
                  neighbor 169.254.10.5 activate
                  neighbor 169.254.10.5 soft-reconfiguration inbound
                  maximum-paths 4
                 exit-address-family

                How to generate traffic

                configure terminal
                ip sla 10
                 tcp-connect 192.168.40.2 443
                 timeout 5000  ! 5 seconds to connect
                exit
                
                ip sla schedule 10 life 10 start-time now
                
                show ip sla statistics 10

                Reference

                https://cloud.google.com/network-connectivity/docs/interconnect/concepts/overview

                https://docs.megaport.com/mve/cisco/creating-mve-autonomous

              Leave a Reply