VXLAN, MBGP EVPN with ingress replication – Part 2 – Configure VXLAN L2 VNI on a single POD

This is the Part 2; Part 1 is here

Example configuration in here are based on Cisco Nexus 9K.

Configurations are very straight forward and simple. As I said earlier – if you have ever configured MP-BGP address families, this will be super easy for you. This doco describes L2 VNI only – there will be another one doco covering L3VNI.

L2 VNI is Type-2 route within EVPN VXLAN – which is “MAC with IP advertisement route”.

L3 VNI is Type-5 route within EVPN VXLAN – which is “IP prefix Route”.

Following is the network topology design diagram I used here in this reference doco.

Notes regarding the network topology-

  • 2 x SPINE switches (IP 172.16.0.1 and 172.16.0.2)
  • 2 x LEAF switches (IP 172.16.0.3 and 172.16.0.4)
  • IP address “unnumbered” configured on the interfaces connected between SPINE and LEAF
  • 2 x VXLAN NVE VTEP interfaces on LEAF switches (source IP 172.16.0.5 and 172.16.0.6)
  • SPINE switches are on the OSPF Area 0
  • LEAF switches are on the OSPF Area 1
  • iBGP peering between “SPINE to LEAF” – mesh iBGP peering topology
  • Both SPINE switches are “route reflector” to the LEAF switches
  • Physical layer connectivity are – “SPINE to LEAF” only. NO “LEAF to LEAF” or “SPINE to SPINE” required. However, “SPINE to SPINE” are optional on a single-pod which can be leverage later on a multi-pod or multi-site EVPN design.
  • VNI ID for this POD-1 is 1xxxx; VNI ID for POD-2 will be 2xxxx. VNI IDs will be mapped to VLAN IDs. Example – VLAN ID 10 mapped to VNI 10010, VLAN ID 20 mapped to 10020.
  • “route-distinguisher (RD)” ID number for this POD-1 is 100; RD ID for POD-2 will be 200.

POD-Underlay-Overlay-SinglePOD-Diag1.2

Before we move to the “step by step” configuration – enable the following features on the Cisco Nexus switches.

Features to be enabled on the SPINE switches –

!
nv overlay evpn
feature ospf
feature bgp
!

Features to be enabled on the LEAF switches –

!
nv overlay evpn
feature ospf
feature bgp
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay
!

Step 1: Setup Loopback IPs on all the SPINE and LEAF switches

We need “loopback 0” IP address for router ID both for OSPF and BGP. Also, we will be using this loopback IP address for SPINE-LEAF connections as “ip unnumbered” source IP address.

Note: you need to configure OSPF routing first.

SPINE switches-

!---SPINE-01
!
interface loopback 0
description “Underlay – Interfaces and Router ID”
ip address 172.16.0.1/32
ip router ospf VXLAN-UNDERLAY area 0.0.0.0
!
!

!---SPINE-02
!
interface loopback 0
description “Underlay – Interfaces and Router ID”
ip address 172.16.0.2/32
ip router ospf VXLAN-UNDERLAY area 0.0.0.0
!

LEAF switches-

!---LEAF-01
!
interface loopback 0
description “Underlay – Interfaces and Router ID”
ip address 172.16.0.3/32
ip router ospf VXLAN-UNDERLAY area 1.1.1.1
!
!

!---LEAF-02
!
interface loopback 0
description “Underlay – Interfaces and Router ID”
ip address 172.16.0.4/32
ip router ospf VXLAN-UNDERLAY area 1.1.1.1
!

Step 2: Setup OSPF routing and Ethernet interfaces IP address

SPINE-LEAF VXLAN is a “full mesh” network – thus, it is hard to track interface IP addresses if they are configured individually with unique IP address; it will be too many IPs and too many IP Subnets!! IP address “unnumbered” is a nice way to avoid too many IPs and Subnets.

I will be using the same “loopback 0” IP address to all the interconnect interfaces between SPINE & LEAF switches with “ip unnumbered” feature.

Interfaces E1/1 & E1/2 are connected to each other between SPINE and LEAF; SPINE to SPINE connection using interfaces E1/10 & E1/11 on both the switches (as per the above network design diagram).

!---SPINE-01 and SPINE-02
!
interface e1/1,e1/2
no switchport
description “connected to LEAF switches – IP Fabric”
mtu 9216
medium p2p
ip unnumbered loopback0
ip ospf authentication message-digest
ip ospf message-digest-key 0 md5 3 yourOSPFsecret
ip ospf network point-to-point
no ip ospf passive-interface
ip router ospf VXLAN-UNDERLAY area 1.1.1.1
no shutdown
!
!---LEAF-01 and LEAF-02
!
interface e1/1,e1/2
no switchport
description “connected to SPINE switches – IP Fabric”
mtu 9216
medium p2p
ip unnumbered loopback0
ip ospf authentication message-digest
ip ospf message-digest-key 0 md5 3 yourOSPFsecret
ip ospf network point-to-point
no ip ospf passive-interface
ip router ospf VXLAN-UNDERLAY area 1.1.1.1
no shutdown
!
!---SPINE-01 and SPINE-02
!---This is NOT required on a single-POD only solution
!
interface e1/10,e1/11
no switchport
description “connected to SPINE switches – back-to-back IP Fabric”
mtu 9216
medium p2p
ip unnumbered loopback0
ip ospf authentication message-digest
ip ospf message-digest-key 0 md5 3 yourOSPFsecret
ip ospf network point-to-point
no ip ospf passive-interface
ip router ospf VXLAN-UNDERLAY area 0.0.0.0
no shutdown
!
!---all the SPINE & LEAF; adjust the loopback0 IP address for each sw
!
router ospf VXLAN-UNDERLAY
router-id 172.16.0.LOOPBACK0-IP
passive-interface default
!

At this stage – OSPF adjacency should be formed between SPINE and LEAF and SPINE to SPINE switches.

Verify your OSPF configuration and connectivity between switches-

show ip ospf neighbors
show ip ospf route
show ip ospf database
show ip ospf interface

Make sure you able to ping between all the SPINE and LEAF switches.

(Screenshot – “show ip ospf neighbor” from SPINE-01)

OSPF-Neighbors

Step 3: Setup MP-BGP peering across all the SPINE and LEAF switches

I will now configure MP-BGP along with address family “l2vpn evpn” on all the switches; we will have to enable “send-community extended” on all the BGP peer to allow exchange of L2/L3 evpn VXLAN encapsulations.

  • iBGP peering between SPINE-01 and SPINE-02;
  • iBGP peering from all SPINE to LEAF – SPINE switches are route reflector server to LEAF switches
  • NO LEAF to LEAF BGP peering
!
!---SPINE switches MP-BGP config SPINE-01
!---adjust the RID IP address for SPINE-02
!---iBGP to LEAF switches
!
router bgp 65501
  router-id 172.16.0.1
  log-neighbor-changes
  address-family l2vpn evpn
    retain route-target all
  !
  neighbor 172.16.0.3
    remote-as 65501
    description "LEAF-01 - iBGP peer - RR client"
    password 3 ef6a8875f8447eac
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
      route-reflector-client
  !
  neighbor 172.16.0.4
    remote-as 65501
    description "LEAF-02 - iBGP peer - RR client"
    password 3 ef6a8875f8447eac
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
      route-reflector-client
!
!---LEAF switche LEAF-01 
!---adjust the RID IP address for LEAF-02
!---iBGP peering to SPINEs in the same POD only
!
router bgp 65501
  router-id 172.16.0.3
  address-family l2vpn evpn
    retain route-target all
  !
  neighbor 172.16.0.1
    remote-as 65501
    description "SPINE-01 - iBGP peer - RR server"
    password 3 ef6a8875f8447eac
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
  !
  neighbor 172.16.0.2
    remote-as 65501
    description "SPINE-02 - iBGP peer - RR server"
    password 3 ef6a8875f8447eac
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
!

BGP Verification –

show bgp all summary
show bgp all neighbors 172.16.0.1
show bgp all neighbors 172.16.0.2
show bgp all neighbors 172.16.0.3
show bgp all neighbors 172.16.0.4

Make sure BGP peering status is “Established” for all.

(Screenshot – “show bgp all summary” from SPINE-01)

BGP-all-summary-1.1

As of now – all the above configurations are typical routing & interface configurations. Next sections describe VTEP, NVE, VNI & EVPN configurations.

Step 4: Setup NVE interface on the LEAF switches only

NVE interface is the VXLAN VTEP. This only requires to configured on the LEAF switches. VXLAN encapsulation happen on the NVE interfaces.

The VXLAN encapsulation/decapsulation concept is very similar to MPLS “PE” and “P” routers; source “PE” routers encapsulate MPLS labels and transport them over the “P” routers to the destination “PE” routers. Source LEAF switch NVE VTEP interface encapsulate VXLAN packets and transport them over SPINE switches to destination LEAF switch NVE VTEP interface.

NVE interface requires a dedicated loopback interface; we will setup “loopback 1” on both the LEAF and bind them to “NVE 1” interface.

!---LEAF-01
!
interface loopback 1
description “Underlay – NVE VTEP source IP”
ip address 172.16.0.5/32
ip router ospf VXLAN-UNDERLAY area 1.1.1.1
!

!---LEAF-02
!
interface loopback 1
description “Underlay – NVE VTEP source IP”
ip address 172.16.0.6/32
ip router ospf VXLAN-UNDERLAY area 1.1.1.1
!

!---both LEAF-01 and LEAF-02
!
interface nve1
  no shutdown
  description "VXLAN - VTEP interface"
  host-reachability protocol bgp
  source-interface loopback1
!

show-nve-interface

Verification –

>ping loopback1 IP address between LEAF-01 and LEAF-02; make sure they are reachable

show interface nve1; make sure nve1 interface is UP

Step 5: Create VLAN, VNI and configure EVPN

We will create traditional VLAN IDs and name, then associate an unique VNI ID to the VLAN ID; finally, we will configure the VNI ID onto NVE and EVPN.

!---on both the LEAF-01 and LEAF-02
!
vlan 10
name VLAN10-TEST
vn-segment 10010
!
vlan 20
name VLAN20-TEST
vn-segment 10020
!

Now add the above VLAN10 and VLAN20 onto NVE1 and EVPN on both the switches-

!---on both the LEAF-01 and LEAF-02
!
interface nve1
!
member vni 10010
    ingress-replication protocol bgp
  member vni 10020
    ingress-replication protocol bgp
!

!---both the LEAF-01 and LEAF-02
!
evpn
  !
  vni 10010 l2
    rd auto
    route-target import 100:10010
    route-target export 100:10010
  !
  vni 10020 l2
    rd auto
    route-target import 100:10020
    route-target export 100:10020
!

At this stage MP-BGP EVPN start exchange VXLAN encap packets between NVE VTEP peer LEAF switches.

Verification –

show nve peers; make sure its showing remote IP of peer & status is UP
show nve vni; make sure status is UP

show nve interface nve1
show nve vxlan-params

NVE-PIC-01

Following screenshot showing NVE port number on Cisco Nexus UDP/4789 –

VXLAN-port-number

Part 6: Verification – MAC Learning, VLAN, VNI ID, VXLAN, BGP EVPN

This is the final part.

As a part of end-to-end L2 VNI VLAN test & verification – we have configured the interface “E1/15” on both the LEAF switches as a “trunk” and connected two routers and configured VLAN10 on them.

Router-01 interface MAC address is “50:00:00:05:00:00”; this is connected to LEAF-01; this MAC is local to LEAF-01; IP address configured here is 192.168.10.1/24.

Router-01 interface MAC address is “50:00:00:06:00:00”; this is connected to LEAF-02; this MAC is local to LEAF-02; IP address configured here is 192.168.10.2/24.

Let’s verify MAC address learning on LEAF-01 and LEAF-02 for VLAN10 –

>ping 192.168.10.2 from Router-01; same way ping 192.168.10.1 from Router-02

show l2route mac all

show-l2route-mac-all-LEAF-01

show mac address-table (on the NXOS 9000v this command is >show system internal l2fwder mac)

show-mac-address-table

The above command on the LEAF-01 returns the following output –

This shows the local MAC “50:00:00:05:00:00” as “Local” on “Port E1/15

This will show the remote MAC “50:00:00:06:00:00” as “BGP” learned and port is “nve-peer1

The above command on the LEAF-02 will return the following output –

show-l2route-mac-all-LEAF-02

show-mac-address-table-LEAF-02

This shows the local MAC “50:00:00:06:00:00” as “Local and “Port E1/15

This will show the remote MAC “50:00:00:05:00:00” as “BGP” learned and port is “nve-peer1

The folloiwng BGP l2vpn command will show EVPN MAC, VNI ID & Route-Type

show bgp l2vpn evpn vni-id 10010

show-bgp-l2vpn-evpn-vniid-LEAF-01

This above command returns BGP EVPN details for the VNI 10010 (VLAN 10); note the first part *>l[2]” – this specify the type of route which is Type-2 for L2VNI.

The above verificaiton clearly showing remote “Layer 2 MAC address” learning over BGP which is MAC over Layer 3 routing protocol!!

Thats all.

VXLAN, MBGP EVPN with Ingress Replication – Part 1 – Basic Facts, Design Considerations and Security

I found too many reference docs on VXLAN, most of them cover early solutions that do not use MP-BGP EVPN and manage advertisement of BUM traffic (broadcast, unknown unicast and multicast) via multicast. I know people who do not want to run mcast in their network!

Here, I focus on VXLAN with MP-BGP EVPN with ingress replication to manage BUM traffic (VXLAN + MPBGP EVPN + Ingress Replication).

So, What Is “Ingress Replication” Compared To “Multicast” Based VXLAN Solution?

The answer is – ingress replication is called head-end-replication which performs unicast delivery of VXLAN encapsulated packet across remote VTEPs. Unicast replication requires a source VTEP to delivery same data to every single remote VTEPs in “one-to-one” fashion – whereas in multicast a rendezvous point (preferred is PIM-SM RP) defined where all the VTEPs join to receive delivery of VXLAN encapsulated data in “one-to-many” fashion. Multicast has lower overhead and can provide faster delivery compared to unicast; however, multicast is less secure.

MP-BGP EVPN is the next generation solution becoming widely popular in Data Center networks (VXLAN EVPN) and Service Provider networks (MPLS PBB-EVPN).

My plan is to create following step-by-step reference documents for VXLAN EVPN with ingress replication.

  • VXLAN, MBGP EVPN with ingress replication – Part 1 – Basic Facts, Design Considerations and Security
  • VXLAN, MBGP EVPN with ingress replication – Part 2 – Configure VXLAN on a single POD – L2 VNI – here
  • VXLAN, MBGP EVPN with ingress replication – Part 3 – Configure VXLAN on multi PODs – L2 VNI
  • VXLAN, MBGP EVPN with ingress replication – Part 4 – Configure VXLAN on multi PODs – L3 VNI
  • VXLAN, MBGP EVPN with ingress replication – Part 5 – Configure VXLAN on multi PODs including a collapsed POD besides Spine and Leaf PODs

This is the Part 1.

Let’s Get Some Basic Facts About VXLAN

  • the initial specification of VXLAN described in RFC 7348; this describes the need for overlay networks within virtualized data centers accommodating high density tenants (4096++) as traditional VLAN based segmentation can go max up to 4096
  • so based on RFC7348, VXLAN is the solution to get rid of classical ethernet (CE) in a data center and extend VLAN boundaries from 4096 to above; ah! NO more VLAN and STP!
  • similar alternative are TRILL, NVGRE, Cisco OTV; however, none are widely accepted except VXLAN
  • VXLAN header size is 8-byte; this includes a layer 2 virtual network identifier (VNI), which is 24-bit long
  • VNI represents a broadcast domain; traditional VLANs are associated with a unique VNI number; you often see the term “bridge-domain” which are in a sense similar to VLANs (or multiple VLAN/subnets of a tenent); a VNI can represent a bridge-domain
  • since maximum number of IEEE 802.1Q VLAN is 4096 – you can have max 4096 VNIs per POD (point of delivery); yes – you sill use VLANs for segregation!
  • in a large DC environment, you have multiple PODs inter-connected together per data center or across multiple data centers; thus you can have a high density multi-tenancy network that goes beyond 4096! Here you need VXLAN VNIs that gives you segments up to 24-bit (16,777,216 unique network segments in decimal)
  • so, VLANs are still there! ah! YES, they are! but VLANs are now local to per POD and/or per switch only; VLAN extension to intra-switches and intra-PODs are done via VXLAN VNIs; switch-to-switch connections are L3 for VXLAN instead of L2 in a typical VLAN based network
  • VXLAN use UDP instead of TCP and use port number 4789
  • L3 connectivity leverage equal cost multi-paths (ECMP) and use all inter-connect links that provide max throughput and redundancy compared to L2 STP that do not forward traffic over all links because of STP block ports
  • since VXLAN header size is 8-byte; VXLAN adds extra overhead to traditional 1500 MTU; VXLAN MTU size is “1500 payload with original IP header + 14 byte Ethernet header + 8 byte VXLAN header + 8 byte UDP header + 8 byte IP header”
  • in a real world – VXLAN deployments are done on 9K MTU size end-to-end; none use 1500 MTU
  • VXLAN requires end-to-end L3 reachability in the underlay network; underlay reachability is done via IGP, most cases OSPF or IS-IS
  • VXLAN encapsulate MAC address into IP packet and transport over L3 network
  • VXLAN is the “overlay networking” that runs on the top of underlay that use local “VXLAN Tunnel End Point – VTEP” interfaces to encapsulate packages into VXLAN
  • VTEP is the interface where VXLAN traffic encapsulation and de-encapsulation happen (origination and termination of VXLAN traffic)
  • VTEP can be hardware based – which is a dedicated network device capable of encapsulation and de-encapsulation of VXLAN packets; Cisco Nexus/Juniper QFX/Arista are good example
  • VTEP can be software based – VXLAN encapsulation and de-encapsulation happen on software based virtual network appliance within a virtualization “hypervisor servers”; underlaying physical network is totally unaware of VXLAN; VMware NSX is an example of software based VXLAN VTEP
  • software based VTEP handles only traffic those traverse via the hypervisor host machine; whereas hardware based VTEP can handle VXLAN traffic in a much broader space
  • VXLAN EVPN involves a “control plane” that handle the MAC address learning (BUM traffic)
  • VXLAN EVPN support “ARP suppression” which can reduce arp flood for “silent” hosts/clients (most hosts send GARP/RARP to the network when they come online; silent hosts dont do that)
  • VXLAN L3VNI requires “anycast gateway” on the Leaf switches which has a shared IP address across all the participating Leaf switches; very similar to other FHRP (VRRP/HSRP/etc…)

vxlan-header-cisco-com

(VXLAN header details – picture copied from cisco.com)

Security in VXLAN MP-BGP EVPN based VTEP

  • previous multicast based VTEP peer discovery didn’t have a mechanism or a method for authenticating VTEP peers; in plain English there was “NO” whitelist for VTEP peers!
  • the above limitations present major security risks in real-world VXLAN deployments because it allows insertion of a rogue VTEP into a VNI segment!
  • if a rogue VTEP has been inserted into the segment, it can send and receive VXLAN traffic! ah! goccha!
  • MP-BGP EVPN based VTEP peers are pre-authenticated and whitelisted by BGP; BGP sessions must be established first for a VTEP device to discover remote VTEP peers
  • in addition to an established BGP session requirements – BGP session authentication can be added to BGP peers (MD5 3DES)
  • in addition to BGP session security – IGP security (aka. auth) can be added to the “underlay” routing protocols

Few Quick Notes on VXLAN Network Design

  • VXLAN network design doesn’t follow traditional “three layers” network design approach (core – dist – access)
  • VXLAN network design typically has two tiers – Spine and Leaf; this design can grow horizontally “pay as you go”; you can add more Spine and Leaf anytime! No more fixed number of switch ports per POD!
  • you can have “super spine” on the top of Spine switches
  • Leaf switches are connected to Spine switches within the same POD
  • Spine to Spine direct network connections are “not” necessary but they “can be” connected
  • underlay IGP ensure end-to-end L3 connectivity within Leaf and Spine switches
  • clients are connected to the Leaf switches (servers, hypervisors, routers etc…)
  • in a multi-POD DC scenario – Spine switches need to be inter-connected (same EVPN control plane across multi-POD); intra-site DCI
  • in a multi-site data centre inter-connect (DCI) scenario “segmented” VXLAN “control plane” are deployed to minimise BUM per data center; inter-DC traffic are handled by VXLAN Border Gateway (BGW) routers
  • in a multi-site DCI scenario, the Border Gateway router (BGW) can be configured on the Spine switches (there are many other connectivity model/scenarios available for BGW); in this case Spine DC-x to Spine DC-y are connected back to back over via L3 link which is very similar to multi-POD Spine to Spine connectivity

Typical VXLAN Design Diag

VXLAN traffic flow diagram – inter-switch VLAN traffic follow L3 path.

VXLAN-VLAN-Path-Diag

Few Notes While Configuring VXLAN on Cisco Nexus NXOS

  • VXLAN EVPN is based on MP-BGP; this is just an extension to MP-BGP which is very similar to MPLS VPNV4 or VPLS l2vpn
  • if you have configured MP-BGP MPLS before – you will find VXLAN EVPN configuration is super easy
  • VXLAN VTEP switches are much like “PE” router in a typical MPLS network