VXLAN, MBGP EVPN with Ingress Replication – Part 1 – Basic Facts, Design Considerations and Security

I found too many reference docs on VXLAN, most of them cover early solutions that do not use MP-BGP EVPN and manage advertisement of BUM traffic (broadcast, unknown unicast and multicast) via multicast. I know people who do not want to run mcast in their network!

Here, I focus on VXLAN with MP-BGP EVPN with ingress replication to manage BUM traffic (VXLAN + MPBGP EVPN + Ingress Replication).

MP-BGP EVPN is the next generation solution becoming widely popular in Data Center networks (VXLAN EVPN) and Service Provider networks (MPLS PBB-EVPN).

My plan is to create following step-by-step reference documents for VXLAN EVPN with ingress replication.

  • VXLAN, MBGP EVPN with ingress replication – Part 1 – Basic Facts, Design Considerations and Security
  • VXLAN, MBGP EVPN with ingress replication – Part 2 – Configure VXLAN on a single POD – L2VNI
  • VXLAN, MBGP EVPN with ingress replication – Part 3 – Configure VXLAN on multi PODs – L2VNI
  • VXLAN, MBGP EVPN with ingress replication – Part 4 – Configure VXLAN on multi PODs – L3VNI
  • VXLAN, MBGP EVPN with ingress replication – Part 5 – Configure VXLAN on multi PODs including a collapsed POD besides Spine and Leaf PODs

This is the Part 1.


Let’s Get Some Basic Facts About VXLAN –

  • the initial specification of VXLAN described in RFC 7348; this describes the need for overlay networks within virtualized data centers accommodating high density tenants (4096++) as traditional VLAN based segmentation can go max up to 4096
  • so based on RFC7348, VXLAN is the solution to get rid of classical ethernet (CE) in a data center and extend VLAN boundaries from 4096 to above; ah! NO more VLAN and STP!
  • similar alternative are TRILL, NVGRE, Cisco OTV; however, none are widely accepted except VXLAN
  • VXLAN header size is 8-byte; this includes a layer 2 virtual network identifier (VNI), which is 24-bit long
  • VNI represents a broadcast domain; traditional VLANs are associated with a unique VNI number; you often see the term “bridge-domain” which are in a sense similar to VLANs (or multiple VLAN/subnets of a tenent); a VNI can represent a bridge-domain
  • since maximum number of IEEE 802.1Q VLAN is 4096 – you can have max 4096 VNIs per POD (point of delivery); yes – you sill use VLANs for segregation!
  • in a large DC environment, you have multiple PODs inter-connected together per data center or across multiple data centers; thus you can have a high density multi-tenancy network that goes beyond 4096! Here you need VXLAN VNIs that gives you segments up to 24-bit (16,777,216 unique network segments in decimal)
  • so, VLANs are still there! ah! YES, they are! but VLANs are now local to per POD and/or per switch only; VLAN extension to intra-switches and intra-PODs are done via VXLAN VNIs; switch-to-switch connections are L3 for VXLAN instead of L2 in a typical VLAN based network
  • VXLAN use UDP instead of TCP and use port number 4789
  • L3 connectivity leverage equal cost multi-paths (ECMP) and use all inter-connect links that provide max throughput and redundancy compared to L2 STP that do not forward traffic over all links because of STP block ports
  • since VXLAN header size is 8-byte; VXLAN adds extra overhead to traditional 1500 MTU; VXLAN MTU size is “1500 payload with original IP header + 14 byte Ethernet header + 8 byte VXLAN header + 8 byte UDP header + 8 byte IP header”
  • in a real world – VXLAN deployments are done on 9K MTU size end-to-end; none use 1500 MTU
  • VXLAN requires end-to-end L3 reachability in the underlay network; underlay reachability is done via IGP, most cases OSPF or IS-IS
  • VXLAN encapsulate MAC address into IP packet and transport over L3 network
  • VXLAN is the “overlay networking” that runs on the top of underlay that use local “VXLAN Tunnel End Point – VTEP” interfaces to encapsulate packages into VXLAN
  • VTEP is the interface where VXLAN traffic encapsulation and de-encapsulation happen (origination and termination of VXLAN traffic)
  • VTEP can be hardware based – which is a dedicated network device capable of encapsulation and de-encapsulation of VXLAN packets; Cisco Nexus/Juniper QFX/Arista are good example
  • VTEP can be software based – VXLAN encapsulation and de-encapsulation happen on software based virtual network appliance within a virtualization “hypervisor servers”; underlaying physical network is totally unaware of VXLAN; VMware NSX is an example of software based VXLAN VTEP
  • software based VTEP handles only traffic those traverse via the hypervisor host machine; whereas hardware based VTEP can handle VXLAN traffic in a much broader space
  • VXLAN EVPN involves a “control plane” that handle the MAC address learning (BUM traffic)
  • VXLAN EVPN support “ARP suppression” which can reduce arp flood for “silent” hosts/clients (most hosts send GARP/RARP to the network when they come online; silent hosts dont do that)
  • VXLAN L3VNI requires “anycast gateway” on the Leaf switches which has a shared IP address across all the participating Leaf switches; very similar to other FHRP (VRRP/HSRP/etc…)


(VXLAN header details – picture copied from cisco.com)


Security in VXLAN MP-BGP EVPN based VTEP

  • previous multicast based VTEP peer discovery didn’t have a mechanism or a method for authenticating VTEP peers; in plain English there was “NO” whitelist for VTEP peers!
  • the above limitations present major security risks in real-world VXLAN deployments because it allows insertion of a rogue VTEP into a VNI segment!
  • if a rogue VTEP has been inserted into the segment, it can send and receive VXLAN traffic! ah! goccha!
  • MP-BGP EVPN based VTEP peers are pre-authenticated and whitelisted by BGP; BGP sessions must be established first for a VTEP device to discover remote VTEP peers
  • in addition to an established BGP session requirements – BGP session authentication can be added to BGP peers (MD5 3DES)
  • in addition to BGP session security – IGP security (aka. auth) can be added to the “underlay” routing protocols


Few Quick Notes on VXLAN Network Design

  • VXLAN network design doesn’t follow traditional “three layers” network design approach (core – dist – access)
  • VXLAN network design typically has two tiers – Spine and Leaf; this design can grow horizontally “pay as you go”; you can add more Spine and Leaf anytime! No more fixed number of switch ports per POD!
  • you can have “super spine” on the top of Spine switches
  • Leaf switches are connected to Spine switches within the same POD
  • Spine to Spine direct network connections are “not” necessary but they can be connected
  • underlay IGP ensure end-to-end L3 connectivity within Leaf and Spine switches
  • clients are connected to the Leaf switches (servers, hypervisors, routers etc…)
  • in a multi-POD DC scenario – Spine switches need to be inter-connected (same EVPN control plane across multi-POD); intra-site DCI
  • in a multi-site data centre inter-connect (DCI) scenario “segmented” VXLAN “control plane” are deployed to minimise BUM per data center; inter-DC traffic are handled by VXLAN Border Gateway (BGW) routers
  • in a multi-site DCI scenario, the Border Gateway router (BGW) can be configured on the Spine switches (there are many other connectivity model/scenarios available for BGW); in this case Spine DC-x to Spine DC-y are connected back to back over via L3 link which is very similar to multi-POD Spine to Spine connectivity




VXLAN traffic flow diagram – inter-switch VLAN traffic follow L3 path.



Few Notes While Configuring VXLAN on Cisco Nexus NXOS

  • VXLAN EVPN is based on MP-BGP; this is just an extension to MP-BGP which is very similar to MPLS VPNV4 or VPLS l2vpn
  • if you have configured MP-BGP MPLS before – you will find VXLAN EVPN configuration is super easy
  • VXLAN VTEP switches are much like “PE” router in a typical MPLS network



Where do I start with network automation using Ansible as a “Network Engineer”?

Companies like Cisco/Juniper expecting all network engineer must have working knowledge on writing “Infrastructure As Code (IaC)” for network devices aka NetworkOps (NetOps) by 2022 or earlier if possible! SD-WAN/SD-Access/ACI/Firewall/NGFW/Cloud Networking – doesn’t matter whatever your networking career track is – you need to know network automation and orchestration.

I have been asked question many times regarding where to start with Ansible as a network engineer last few months. More precisely the question was always – I am a network engineer, now I want to learn network automation and orchestration with Ansible – can you please tell from “where to start”?

I try to shed some light on this; Following are my step-by-step guide on where to start with Ansible for network engineers. This step-by-step guide comes with following:

  • Ansible installation on Linux (Ubuntu).
  • Playing with Ansible playbooks including three (03) tasks;
    • returning “show run” from Cisco IOS devices
    • configure network interface with IPAddr and OSPF 100″ on Cisco IOS devices
    • capture “show run” output and save to a file as backup for Cisco IOS devices
  • This also includes where to start with Linux system and an text editor on the Day 1 if you do not have any prior Linux knowledge.

Ansible IaC codes are based on YAML; YAML is the easiest one to start with if you have no prior coding skill.

Part 1: Learn Linux and at least one CLI text file editor (optional)

If you already have some Linux experience – you can start from Part 2.

Why Linux? Most of the automation and orchestration tools preferred OS platform is Linux – that’s why (I can tell you another 101 reasons on why you should learn Linux as a network engineer).

Download a copy of Ubuntu/Debian/CentOS ISO image. Install the OS on VMware or VirtualBox. To start with – follow the “default” next -> next -> finished installation instructions. If the installer ask you to create an user account – create it. Aim to learn customised advanced installation later on once you have some confidence on Linux; Google it for details.

Ok, your installation is done; you have logged into you newly installed Linux instance – what is next on the day 1?

Well, let’s start with Linux file system.

If you look at a Windows file system, you already know how it looks like! You understand the meaning of C:\ drive, D:\ drive, C:\Program Files, C:\Windows, C:\Windows\System32 etc. The same way you need to know Linux file systems; everything within Linux are files and directories – let’s have a look into “key” Linux file systems following table –

Linux File System Name What is does? Similar Windows File System
“/” Linux file system is based on hierarchical fashion. “/” this is the root of all Linux file system. This is not visible within Windows.
“/etc” Host specific system configuration directory C:\Windows\System32
“/lib” Shared Library files C:\Windows\System32 – DLL files
“/var/logs” Log files for system Event Viewer events
“/home/**” User home directories C:\Users\**
“/usr” User utilities and application C:\Program Files\**
“/boot” Boot partition or boot file system Boot drive on Windows; most of the case it is “C:\”
“/tmp” Temporary files C:\Temp
“/bin” Essential user binary files/command file C:\Windows\System32 – EXE files
“/opt” Add-on application software package directory D:\Program Files – if you have any
“/mnt” Temporary mount directory – CD/DVD CD drive/DVD drive

How to browse Linux file systems on CLI? Very simple, following –

$cd /name_of_directory ;example - $cd /var/log; $cd /etc/network
$pwd ;this command will show where you are – within what directory
$ls ;this command will list the file within a directory
$ls -la ;this command will list the files in a directory with details

Few useful Linux CLI commands to start on the day 1 –

Command Name What is does
who Show who are the logged in users at the moment
whoami Show username for the current working session
uname -a Show Linux kernel version
ifconfig Show NIC config with IP address
netstat -na Show all open and connected TCP/UDP sockets
netstat -nr Show routing table
cat /dir/filename Show the content of a file
tail /var/log/filename Show last few lines of a file

It’s time for you to start Googling for more useful Linux commands; search for “20 useful Linux commands” and practice them immediately.

Let’s setup networking on the new Linux – I mean setting up a network interface with IPAddr/SubnetMask/DefaultGW/DNS etc.

You need to edit and add your network specific details onto the network interface configuration file. Well – probably you don’t know how to edit a file on Linux CLI? There so many text file editor available for Linux/Unix – my favourite one is “vi”; let’s have a look how to start with “vi” on the day 1:

“vi” commands/options What is does?
$sudo vi newfilename This command will create a new file if the file is not already existed. Example:

$vi /tmp/myfirstfile.txt

$sudo vi /etc/hosts This command will open the file “/etc/hosts” – this file already exist.
Press “i” “i” means insert/edit mode; you are now allowed to start typing in the file. Make sure to open the file first using “vi” as mentioned above.
Press “esc” This will put back the file in read mode from any other modes such as insert/edit/append/search. You can only read and scroll in this mode
Press “a” “a” is append and allow edit; move cursor to a character > press a > then start writing new
Press “r” “r” is for replace a character; more cursor to a character > then press r > then press new character
Type in “:10” This will take you to “line” number 10; go to line 10
Type in “/typesomething” “/hostname” this will search for “hostname” in the file
Type in “:w” write/save; make sure to press “esc” first then “:w”
Type in “:wq” write/save and quit/close the file
Type in “:wq!” write/save and quit/close the file in forced mode
Type in “:q!” force quit/close

Network configuration files are stored in the “/etc/netplan/” directory on the latest Ubuntu Linux; the file name is something like “50-cloud-init.yaml” – make sure it is the “.yaml” extension file.

To configure the new Linux NIC – you need to do the following –

Step1: get the NIC name and number (ensXX)

$sudo ifconfig

This will display the network interface card details – let’s say the name and number is “ens33”:


Step2: configure the NIC with IP address details by editing the YAML config file

Let’s say we configure “ens33” with “DHCP auto IP: config as following –

$sudo vi /etc/netplan/50-cloud-init.yaml

Enter the following details:

  version: 2  
  renderer: networkd  
      dhcp4: true

If we want to add a “static IP address” – the do the following –

$sudo vi /etc/netplan/50-cloud-init.yaml

Enter the following details –

  version: 2
  renderer: networkd
          search: [test.local, otherdomain]
          addresses: [,]

Now you need to apply the new configurations; enter the following commands –

$sudo netplan try ;this command should return “Configuration accepted.”
$sudo netplan apply ;this command will apply the new setting based on the file
$sudo ifconfig ;this command will show you the IP address details

Do a ping to a remote machine.

Part 2: Get Ansible installed on your new Linux system

As a part of learning Linux sysadmin tasks – you will now learn package management! This is basically how to install/uninstall/update new software packages on a Linux system. Different Linux distributions (Redhat/Debian/Ubuntu/CentOS/…) use different tools for package management. I will show you how to install Ansible on Ubuntu/Debian based system.

There are two parts of Ansible –

  • Ansible Control Machine; this is from where you store you Ansible configuration files and control target systems
  • Ansible target machine or target nodes; Ansible support wide range of targets including Linux, Windows, Cisco, Juniper, Palo, F5, AWS, GCP, Azure, etc….

You need to install Ansible “only” on the controller machine; NO need to install Ansible in the target nodes. Ansible is agentless – so no Ansible client software provided for a target node.

Ansible installation details are available here on the official document site – https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html

Package management on Ubuntu is done via “apt” utility; we will install Ansible using “apt” on our new Linux machine; following are the commands.

$sudo apt-get update ;this will update the package list on the Linux
$sudo apt-get install software-properties-common ; this will install software name “software-properties-common”
$sudo apt-add-repository --yes --update ppa:ansible/ansible ;this will add new software repository
$sudo apt-get install ansible ;this installs the ansible packages along with its dependencies

Enter command “$sudo ansible –version” to verify your installation.


Part 3: Let’s play with Ansible – Final Part

In this part I am covering how to use manage remote networking devices using Ansible.

There are two key files when playing with Ansible:

  • Inventory file ; this file contains list of remote devices
  • YAML playbook file/files; these files contain list of “tasks” to be applied onto the remote devices listed on the inventory file

YAML playbook “tasks” are defined within Ansible “modules”; let’s see the example following:

  • you want to send “show” commands to Cisco IOS devices, so you need “ios_command” module installed in your Ansible
  • you want send configuration commands to Cisco IOS devices to configured interfaces/routing, so you need “ios_config” module installed in your Ansible

If you don’t have “ios_config” module installed – then you will not be able to configure Cisco IOS devices using Ansible! Good news is default Ansible installation comes with hundreds of pre-installed modules including Cisco IOS, Cisco ASA, Juniper JUNOS, Palo Alto PanOS, AWS, GCP, Linux, Windows etc. Ansible is giving you option to create your own module in-case if you couldn’t find a module for a new type of device/software/system.

Following command list available installed modules in Ansible:

$ansible-doc -l | grep ios ;this will list all Cisco IOS modules
$ansible-doc -l | grep nxos ;this will list all Cisco NXOS modules
$ansible-doc -l | grep panos ;this will list all Palo PANOS modules
$ansible-doc -l | grep junos ;this will list all JUNOS modules

Ansible website has comprehensive details on every module; have a look at the “ios_config” module official page at -https://docs.ansible.com/ansible/latest/modules/ios_config_module.html?highlight=ios_config

Apart from the above inventory and YAML playbook file types – the main Ansible system parameters configuration file is “/etc/ansible/ansible.cfg”.

Step1: Ansible inventory file

Let’s have a look – what we can enter onto the “inventory” file; this file looks like following –

#Note – My list of Network Devices – Cisco/Juniper/Palo

csr-1000v-01   ansible_host=
csr-1000v-02   ansible_host=
sw-tst-01      ansible_host=
sw-tst-02      ansible_host=
asa-tst-01     ansible_host=





Based on the above –

  • The first part is the name/identification of the remote device; “ansible_host” is the variable name with a value of an IP address to send connection requests to the remote device (this could be be FQDN or hostname instead of IP address).
  • Secondly, we have created three groups – csr-routers/switches/asa-fws.
  • Lastly, we put all the three groups onto a new group/big group “routernswitches”.

By default, there is another group there called “all” – this includes ALL the devices in the inventory list; you don’t need to define “all” separately.

Cool – now we have got our inventory file ready!

Important Note: Ansible is agentless – then how it is going to talk to the remote systems? The answer is – for Linux/Cisco IOS/NXOS/JUNOS and similar Ansible use SSH connection. For Windows based targets Ansbile use PowerShell Remote Admin windows feature!

Step2: Ansible YAML playbook file/files

Once the inventory file is ready – next step is to create Ansible playbook. Question is what is a playbook, what is it’s role? Well – a playbook contains list of actions/tasks to be performed onto the remote devices; each of the action is called a “play”. While creating a playbook you need to define the workflow of tasks in “correct order” to be automated then convert them to different “plays” within an Ansible playbook file. Playbooks are written on YAML syntax.

As a network engineer you are already familiar with what steps it takes to setup two routers with OSPF; let’s break it down into multiple tasks (aka plays) –

  • Task 1: setup common parameters such as hostname/dns server/etc (Play 1)
  • Task 2: setup network interfaces with IP address on both the router (Play 2)
  • Task 3: setup ospf parameters on both the routers (Play 3)
  • Task 4: return/display “show run” on both (Play 4)
  • Task 5: return/display “show ip interface brief” (Play 5)

[Note: make sure to setup SSH and a network interface on both the routers so that Ansible controller can connect to both the above routers]

We can put together all the above plays (play 1 to 5) onto Ansible YAML playbook file and push to both the routers at the same time!

Before proceeding to YAML syntaxes – you “should” learn the following YAML items:

  • YAML “directory” items
  • YAML “list” items

Let’s create our first YAML script now!

Task A: Display “show run” and “show ip interface brief”

We want to display the above show from the “csr-1000v-01” and “csr-1000v-02” listed on our inventory file we have created earlier.

  name: Test playbook to return “show” command from Cisco CSR 1000v
  hosts: csr-routers
  connection: local
      username: cisco
      password: cisco
    - name: Print running config (play1)
        provider: "{{ cli }}"
            - show run
      register: show_run
    - debug:
        var: show_run

    - name: Print ip interfaces (play2)
        provider: "{{ cli }}"
            - show ip interface brief
      register: show_int
    - debug:
        var: show_int

Let’s do a debrief of the above YAML syntax:

YAML item name What is does?
name: Test platbook to return… this is just a name to identify the YAML playbook file
hosts: csr-routers this is the host group we configured in the inventory file, this can be a single device or “all” for all the devices within the inventory file list as well
connection: local this tells where to execute this YAML playbook file, in this example we are executing this on the local Ansible controller running Linux
vars: cli: …. we define this variable for username and password to connect to the remote CSR routers, otherwise we need to define this every time we want to run a play
tasks: list of plays starts from here
ios_command: this is pre-defined Ansible module created using Python; Cisco IOS show command details are listed here in this module
provider: this tells Ansible to use the pre-defined variables (cli with username/password) while sending remote connection requests to remote device
command: the actual Cisco IOS command
register: this tells Ansible to capture the command output within Ansible in a variable name defined in the register section
debug: var: this tells Ansible to display the previously captured register value what is stored in the register variable

Let’s create the Linux files and execute the above playbook with the inventory list –

$sudo mkdir /opt/ansible-cisco-practice
$cd /opt/ansible-cisco-practice
$sudo vi my-first-playbook-cisco.yaml ;copy and paste the above YAML contents
$sudo vi inventory.txt ;copy and paste the inventory file contents here

$sudo ansible-playbook my-first-playbook-cisco.yaml -i inventory.txt

Based on the above the “ansible-playbook” is the command name to execute a YAML playbook file; the option “i” specify the location of the inventory list file.

You might end-up seeing Ansible error message showing “unable to connect” to remote devices using SSH due to “SSH host fingerprint” issue – by default SSH validates host fingerprint for security. Two options to get this fixed as following:

  • manually send a connection request to the Cisco IOS device from the Ansible controller Linux via SSH – this will add the IOS device SSH fingerprint onto the Linux on the very first connection request.
  • Or update the Ansible system parameter configuration file to tell Ansible “not to validate” SSH host fingerprint; edit the “/etc/ansible/ansible.cfg” file and uncomment the following parameter and the re-run the “ansible-playbook” command with inventory file:
$sudo vi /etc/ansible/ansible.cfg

#host_key_checking = False  ;remove the hash # from this line
host_key_checking = False

You should be able to see Ansible is connecting to both the Cisco CSR routers mentioned and executed Play1 and Play2 with the “show” commands outputs!

You are nearly there! Now you know how to setup Ansible and get commands executed on remote Cisco IOS devices! Congratulations!

Task B: Configure Cisco an Interface and few OSPF parameters

You want to send the following configurations to both the CSR routers:

On “csr-1000v-01”:

interface G2
  description “connected to Router XX”
  ip address
  no shut
router ospf 100
  passive-interface G1
  network area 0

On “csr-1000v-02”:

interface G2
  description “connected to Router XX”
  ip address
  no shut
router ospf 100
  passive-interface G1
  network area 0

On both “csr-1000v-01” and “csr-1000v-02”:

ip name-server
ip name-server
ip domain-name test.local

Let’s create the Ansible YAML playbook file with the above:

  name: Test playbook – IOS Configs for Cisco CSR 1000v
  hosts: csr-routers
  connection: local
      username: cisco
      password: cisco
     - name: Configure TopLevel IOS configs DNS/DomainName on Both (Play1)
         provider: "{{ cli }}"
            - ip name-server
            - ip name-server
            - ip domain-name test.local

    - name: Configure GE2 interface on csr-1000v-01 only (Play2)
      when: ansible_host == ""
        provider: "{{ cli }}"
            - description "Connected to Router XX"
            - ip address
            - no shutdown
        parents: interface G2

    - name: Configure OSPF 100 on csr-1000v-01 only (Play3)
      when: ansible_host == ""
        provider: "{{ cli }}"
           - network area 0
           - passive-interface G1
        parents: router ospf 100

    - name: Configure GE2 interface on csr-1000v-02 only (Play4)
      when: ansible_host == ""
        provider: "{{ cli }}"
            - description "Connected to Router XX"
            - ip address
            - no shutdown
        parents: interface G2

    - name: Configure OSPF 100 on csr-1000v-02 only (Play5)
      when: ansible_host == ""
        provider: "{{ cli }}"
           - network area 0
           - passive-interface G1
        parents: router ospf 100

     - name: Save configs on both CSRs (Play6)
         provider: "{{ cli }}"
            - do write

    - name: Lastly display “show run” on both CSRs (Play7)
        provider: "{{ cli }}"
           - show run
      register: show_run
    - debug:
        var: show_run

Let’s demystify the above YAML script –

Play Name Ansible Module Used Target CSR Router
Play1: TopLevel IOS configs DNS/DomainName ios_config Both CSR routers
Play2: Configure GE2 interface with IP ios_config csr-1000v-01 only
Play3: OSPF 100 Configuration ios_config csr-1000v-01 only
Play4: Configure GE2 interface with IP ios_config csr-1000v-02 only
Play5: OSPF 100 Configuration ios_config csr-1000v-02 only
Play6: Save configs ios_config Both CSR routers
Play7: display “show run” ios_command Both CSR routers

We have used “condition when” here to push configurations to specific routers.

Let’s say – we save the above YAML syntaxes on a file called “my-second-playbook-cisco.yaml”; we can now execute this playbook –

$cd /opt/ansible-cisco-practice
$sudo vi my-second-playbook-cisco.yaml ;copy and paste the above YAML contents

$sudo ansible-playbook my-second-playbook-cisco.yaml -i inventory.txt

You should see the playbook execution results and “show run”. Screenshot following:


See the “changed” and “skipping” in the output above; this is because we used condition “when”.

Task C: Let’s take a backup of IOS device configs from “show run”

This is very simple; most of YAML syntax are based on the first YAML script “my-first-playbook-cisco.yaml” for this backup job. In this example – we will tell Ansible to save the “register: show_int” output onto a TXT file for both the CSR devices.

  name: Playbook to backup configs of Cisco CSRs
  hosts: csr-routers
  connection: local
      username: cisco
      password: cisco
    - name: Print running config – csr-1000v-01 (play1)
      when: ansible_host == ""
        provider: "{{ cli }}"
            - show run
      register: show_run_csr-1000v-01

    - name: save output to /opt/ios-backups - csr-1000v-01 (play2)
      when: ansible_host == ""
          content: "{{ show_run_csr-1000v-01.stdout[0] }}"
          dest: "/opt/ios-backups/backup_{{ ansible_host }}.txt"

    - name: Print running config – csr-1000v-02 (play3)
      when: ansible_host == ""
        provider: "{{ cli }}"
            - show run
      register: show_run_csr-1000v-02

    - name: save output to /opt/ios-backups - csr-1000v-02 (play4)
      when: ansible_host == ""
          content: "{{ show_run_csr-1000v-02.stdout[0] }}"
          dest: "/opt/ios-backups/backup_{{ ansible_host }}.txt"

Now execute the above playbook; our YAML name file for Cisco IOS back is “cisco-ios-backup-playbook.yaml” –

$sudo mkdir /opt/ios-backups
$cd /opt/ansible-cisco-practice
$sudo vi cisco-ios-backup-playbook.yaml       ;copy and paste the above YAML contents
$sudo ansible-playbook cisco-ios-backup-playbook.yaml -i inventory.txt
$cd /opt/ios-backups
$ls -la

You should be able to see two backup files in the directory “/opt/ios-backups”; run the “cat” command to see the contents of these files. Your backups are done!

What is NEXT?

Once you are familiar with Linux and Ansible YAML – you next step on NetOps should be following:

  • Learn GIT and GitHub. GIT is source code version control system for your YAML scripts. I will write a separate post on this later.
  • Learn Ansible Tower or similar such as Foreman or Ansible Semaphore UI; this will give you huge control over your centralised Ansible orchestration, visibility, control, reporting and many more.
  • And obviously learn Ansible advanced details (keep exploring different Ansible modules on official Ansible web site).


Application Whitelisting on Windows and App Execution Analytics (using AppLocker, AppIDSvc and Splunk)

If you familiar with security compliance requirements such as PCI DSS or HIPAA – one of the requirements is “application whitelisting”. Application whitelisting is the solution that allows execution of pre-approved apps and scripts only and disallow rest.

Application whitelisting can be done using many tools – in this example I will discuss how to get application whitelisting done using in-build Windows tools; I will use Windows AppLocker utility to implement application whitelisting. I will discuss setting up Splunk for AppLocker, so that we get real time visibility/analytics of application whitelisting and alerting.

This HOWTO got two parts –

Part 1 – this discuss technical steps regarding how to setup application whitelisting on Windows platform and push the settings to bunch of windows computers.

Part 2 – this discuss technical steps regarding how to get visibility, analytics and alerts about the application whitelisting using Splunk (e.g. application whitelisting logs showing which apps are allowed, which are denied, who executed the app, when, from where etc).

Part 1 – Setting up the Application Whitelisting on Windows

Following are the steps for Part 1.

Step1: Start the “Application Identity” (AppIDSvc) service & set to start automatic

AppIDSvc service is a Microsoft service used by AppLocker to determine and verify the identity of an application. Without AppIDSvc AppLocker is unable to determine and verify application, scripts, installers and executables.


Step2: Setup Application Whitelisting using “Local Group Policy Editor” or “Group Policy Management Console”

AppLocker settings are available within “Computer Configuration -> Windows Settings -> Security Settings -> Application Control Policies -> AppLocker”. In an ideal environment all the AppLocker settings should combines into a single Group Policy Object (GPO) and pushed onto computers via Active Directory.

Set the policy “Enforcement rules” first

Right click on the AppLocker -> go to Properties -> Select “Enforcement rules” for both Executables and Scripts. Enforcement rule enforces “allow” and “deny” operations.

“Executable rules” are applied to application programs installed on the Windows OS.
“Scripts rules” are applied to all scripts available on the Windows OS.

“Audit only” – this setting does not prevent execution rather it generates audit logs only about what items are executed on the Windows OS and who executed it.


Set the Executable Rules

Set allow or deny action to executable application here; few options available here –

Executable Rules based on “Publisher” – allow all signed software by authorised publisher.
Executable Rules based on “Path” – allow specific file or folder. I prefer this.
Executable Rules based on “File hash” – this is for application which are not sighed.

Example screenshot of “Executable Rules” – in this example users (everyone) are allowed ONLY to execute “7-Zip” and “Notepad++” which are installed within “C:\Program Files\” or “C:\Program Files (86)\”; whereas “Administrators” can execute all; there is a “Deny” by default for rest.

Interestingly the same variable “%PROGRAMFILES%” returns both “C:\Program Files\” & “C:\Program Files (x86)”.


Following screenshot example shows default “Executable rules” which permits everything along with a rule to deny “Google Chrome” for everyone including Administrators; deny overrides other options.


Set the Script Rules

Script rules options are same as the executable rules – Publisher, Path and File Hash along with Allow or Deny. Also, you can create default rules which allows everything.

Following “Script Rules” screenshot shows the same BAT file “TestBATScript.bat” is allowed on the %OSDRIVE% which is the “C:\Scripts” for users and denied on the “E:\Scripts\” for everyone.


If the above settings are pushed via GPO – it requires some time to applied to the destinations computers. This can be forced or the destination computer can be rebooted to get these settings immediately pushed.

Also, if we remove AppLocker settings on a computer – this takes few minutes (2-5min) to take effect as well; don’t expect result immediately.

Step3: Verification

As we have configured “Deny” on the “Google Chrome” for all users – it will pop-up with the following error message when someone tries to open it up –


Also, we have configured the “TestBATScript.bat” to allow execute from “C:\Scripts” and deny from “E:\Scripts\”; following screenshot says it all –


Part 2 – Visibility and Analytics of Application Whitelisting using Splunk

A complete real time visibility and analytics of application executables and scripts across all the servers (100+ servers) are important to support the platform. Following are interesting items to application whitelisting analytics –

  • Who is executing what application
  • On what servers/system
  • What application are allowed
  • What application are denied
  • When/What time an application executed
  • Knowing the system applications
  • Knowing user defined applications
  • Sending alert email when an application/script execution is blocked

The above key interesting items about application whitelisting are available within AppLocker Windows Event Log files; the location of these logs are at Event Viewer -> Application and Services Logs -> Microsoft -> Windows -> AppLocker (EXE and DLL; MSI and Script). Example screenshots are following –



To get real time analytics of what’s happening within AppLocker onto Splunk – we need to redirect AppLocker logs onto -> Splunk using the “Splunk Universal Forward”.

Setup Splunk Universal Forwarder (SUF)

SUF is free downloadable from www.splunk.com; download and install it on the target Windows computer. Ideally, it should be part of base Windows OS build template – so that we don’t need to install it manually every time.

During the SUF installation – we select the following “Security Log” only; although this is not a requirement for AppLocker to select security log; however, selecting security logs fulfil many compliance requirements. Select other type logs based on business requirements; also event logs selection can be done later on after the installation.

Make sure your Splunk server is up and running.

[select Windows Event Logs]


Enter the Splunk server IP address and receiving port number to redirect logs to.

[Enter the destination Splunk receiving server and port number]


After the installation – add the following lines onto the SUF local site config file “C:\Program Files\SplunkUniversalForwarder\etc\system\local\input.conf” –

[WinEventLog://Microsoft-Windows-AppLocker/EXE and DLL]
disabled = 0

[WinEventLog://Microsoft-Windows-AppLocker/MSI and Script]
disabled = 0

The above lines will redirect AppLocker “EXE and DLL” and “MSI and Script” logs onto -> Splunk; the “evt_resolve_ad_obj=1” will allow identify/show Active Directory user names.

Restart the SUF service.

At this stage AppLocker logs will start flowing onto the Splunk; based on index settings Splunk will automatically add these log entries onto the respective index or to the default index.

Following are few examples of AppLocker analytics dashboards within Splunk –

Screenshot of who/what action/what application/when/from is following; Splunk search string for this:

source="WinEventLog:Microsoft-Windows-AppLocker/*"| table host, User, Type, Message, _time


Screenshot of total number of applications following; Splunk search string for this –

source="WinEventLog:Microsoft-Windows-AppLocker/*"| chart count by Message


Screenshot of total number of denied applications is following; Splunk search string for this –

source="WinEventLog:Microsoft-Windows-AppLocker/*" Type=Error| chart count by Message


Screenshot of email alert when there is a deny following; Splunk search string is following –

source="WinEventLog:Microsoft-Windows-AppLocker/*" Type=Error | table host, User, Message, _time


[Screenshot of Splunk alert email triggered on deny condition]


One of the key reason for alert emails – incase any “required” apps missed out from whitelisting – you will get details of the app even before the end user/team tells you to whitelist it.

Thats ALL!


AWS VPC Networking – discussing all type of VPC network “GATEWAYS” (part 1)

I was discussing AWS VPC networking and how network traffic come in/out to a VPC from different destinations with my team. Then later I though – lets put it on my blog – this will help others as well. I am discussing VPC gateways from a typical network engineer’s point of view.

There are many different type of gateways (network routers) on AWS VPC networking. Each of them have different roles – you put together different gateways to make a complete solution. Gateways are key components of a routing table – here I will show all the gateway items available on a “VPC routing table”.

Following diagram shows all the different types of gateways/routers on AWS VPC platform (follow the traffic path arrow head):


Lets discuss the key attributes (what are they? what they can do?) of the VPC gateways:

i. Virtual Private Gateway (VGW-nn)
This is a multi-purpose network gateway appliance provides in/out routing to a VPC. Key attributes of VGW:

  • this is a multi-purpose network gateway appliance provides in/out routing to a VPC
  • the destination networks can be via AWS DirectConnect to a self-managed data centre or can be over IPSec VPN (via AWS VPN connections)
  • for IPSec VPN – an AWS “VPN connection” object need to be attach to VGW
  • for IPSec VPN – supported routing protocols are BGP and Static
  • for AWS DirectConnect connection – VLAN tagged virtual interfaces (VIFs) are needs to be created for IP routing and attached to VGW
  • for AWS DirectConnect connection – BGP is only supported routing protocol
  • when more then one interfaces available ECMP is configured by default for both IPSec VPN and DirectConnect while sending traffic from AWS to a remote destination
  • BGP path selection can be manipulated by “AS path prepending” sending from the source to AWS
  • “VGW” instances are available within VPC routing table to be set as target

ii. Customer Gateway (CGW-nn)
CGW are part of IPSec VPN connectivity to a VPC. Key attributes are following:

  • CGW represent remote end VPN gateway
  • AWS “VPN Connections” are required to attached a CGW to itself
  • without having a CGW “AWS VPN Connection doesn’t know where to send traffic to

iii. Internet Gateway (IGW-nn)
Key attributes of IGW are following:

  • provides internet in/out (both way) to a VPC and its contents
  • provides inbound Internet to Elastic Load Balancer
  • provides internet access to L4-L7 network appliances (F5 BIP-IP, Cisco ASAv, Juniper SRX etc)
  • provides internet access to VPC NAT GW
  • outbound traffic from a VPC can be sent out via either IGW or via VPC NATGW (will discuss this in next part2 – VPC routing tables and subnets)
  • AWS Elastic IP address rateability to an VPC object are done via IGW
  • “IGW” instances are available within VPC routing table to be set as target

iv. VPC NAT Gateway (NAT-nn)
Key attributes of VPC NATGW are following:

  • provides NAT outbound only (one direction) to VPC and its contents
  • NAT Internet access is done via an IGW
  • NAT can not access Internet directly (without having an IGW)
  • “NAT” instances are available within VPC routing table to be set as target

There are lot security requirement scenarios where you allow internet access for systems/servers only via NATGW; no inbound are permitted and local systems are kept fully local only.

v. Layer4-Layer7 network appliances as Gateway
These are basically an EC2 instance with 2 or more NICs providing network connectivity.
Key attributes are following:

  • cloud network admins have flexibility to deploy their own network appliance (F5, Cisco, Juniper, Sophos, Barracuda etc)
  • even an EC2 instance of any OS (Linux/Windows) with 2 x NICs can be converted to a routing device/NAT appliance (need to disable Source/Destination Check under EC2 Networking)
  • this type of device rely on IGW to route traffic to internet (just like the NAT gateways)
  • this type network appliance can provide both in/out traffic (via NAT translation or Proxy) to VPC and its contents
  • this type network appliances (EC2 instances) are available within VPC routing table to be set as target

vi. VPC Peering (PCX-nn) 
A special type of gateway for inter-VPC communication. VPC peering are used when creating inter-connect between VPCs. Following are attributes of VPC peering network:

  • provides peer-to-peer connectivity to two VPCs only
  • in a scenario where “VPC A” peers to > “VPC B” and “VPC B” peer to > “VPC C” – “VPC A” can not talk to “VPC C”
  • does not provides transit path
  • in above scenario “VPC B” cannot be used as a transit route for VPC A to > VPC C
  • “pcx” are available within VPC routing table to be set as target

In the next part I will be discussing VPC “subnets” and “routing tables” which are capable to cater complex segregated routing requirements on AWS platform.

Cisco Nexus vPC and non-vPC VLANs together on the same platform (Nexus hybrid setup) – NXOS v7.0(3)4(1)

Cisco discontinued “spanning-tree pseudo-information” starting from NXOS version 7.0(3)4(1) on the 9000 platforms. So what is the solution for Nexus vPC and non-vPC VLANS on the same platform (hybrid)? Is it no longer going to be supported on NXOS/9000 platforms?

Although hybrid is not recommended vPC design for “aggregation layer” but you find a lot scenarios where you need to have both vPC and non-vPC within the same platform (mostly in mid-size data centres; where you have a lot reasons you can’t deploy traditional ethernet switches; hence you have considered the Cisco Nexus platforms).

If you carry everything (both vPC VLANs and non-vPC VLANs) over the vPC peer-links (yes, vPC carry orphaned VLANs as well!) – in this case if there is any issue happen on the vPC peer links and that stops the vPC working, the downstream switches those are connected to the Nexus via non-vPC/non-vPC VLANs will stop forwarding frames due to STP (any STP – STP/RSTP/PVSTP/MSTP) blockage as they don’t have any information about what happened between the vPC peer Nexus switches in a vPC peer failed scenario.

Experts suggest that you should have an additional Layer 2 trunk port-channel alongside your vPC peer-link; this Layer 2 port-channel will carry non-vPC VLANs in case vPC stops working (whatever reason; could be a Nexus reboot during maintenance!).

Well, now you have setup a seperate Layer 2 trunk port channel for non-vPC VLANs and shutdown the vPC peer link to test it – but you found it is still not working as expected! STP is blocking the Layer 2 trunk link, whats the problem! Cisco used to have a solution for this called “spanning-tree pseudo-information“; as I have mentioned in the beginning Cisco discontinued this starting from version 7.0(3)l4(1) on the Nexus 9000 platform. So what is your option for this? Should you stop using Nexus if you don’t follow spine and leaf design?

Yes – there is an answer to this, there is a small trick to make this working! By discontinuing psuedo-information, Cisco basically makes it ever easier to configure; you need to do the following –

i. First, you need to set STP root priority for the VLANs to a lower priority number on one of Nexus switch (default priority is 32768, lower is prefer; this should be done on the primary vPC role Nexus switch) and leave STP untouched on the other Nexus switch (this is the vPC secondary role switch).

ii. Second, on the non-vPC trunk port-channel set “spanning-tree port type normal” on “both” the switches; (“spanning-tree port type network” is recommended for vPC peer-link).

Here is an example config -Here is an example config –

On the vPC primary role switch, apply the following -

(config)#spanning-tree vlan 10,20,30,40-50 priority 8192

On both the switches, apply the following -

(config)#interface port-channel10
(config-if)#description "non-vPC trunk port" 
(config-if)#switchport mode trunk 
(config-if)#switchport trunk allowed vlan 10,20,30,40-50 
(config-if)#spanning-tree port type normal

Without having the above configuration applied you will find STP is blocking the non-vPC Layer 2 trunk link even if the vPC peer-link is shutdown. Also in this example – the vPC primary role switch will be the STP “root bridge” because of lower priority configured (8192).

To test your configuration – shutdown the vPC peer-link, run “show spanning-tree vlan xxx” – you should see STP put the L2 trunk interfaces in forwarding state immediately.

Here is a vPC and non-vPC VLANs on the same platform diagram –



Cisco UCS Platform Emulator – UCSPE 3.1(2ePE1)

Cisco UCS Platforms are expensive kit to play with. UCS Platforms are not just a standalone router, switch or a firewall that could easily be emulated on a PC – they are bunch of kits interconnected together to deliver the UCS platform. UCS is not a server – it’s a system or a platform compared to traditional computing; UCS comes with unified (LAN/SAN/FC/FCoE/HBA/others) and stateless computing together.

Cisco has come up with a solution to help engineers to get their hands dirty on UCS Platforms – the solution is called Cisco UCS Platform Emulator (UCSPE). The latest UCSPE is version 3.1(2ePE1). Cisco made UCSPE available to everyone – you only need to have a Cisco login.

The whole UCSPE comes with the following-

i. 2 x Fabric Interconnect (Model: UCS-FI-6332-16UP)
ii. 2 x FEX (Model: Nexus 2348UPQ N2K-C2348UPQ)
iii. 3 x UCS 5108 Chassis with 12 x different model blades
iv. 1 x UCSC C3X60 server (Model: UCSC-C3X60)
v. 7 x UCS C series servers (220/240/460)

The above are enough to emulate a complete decent size UCS Platform!

Yes! you can create bunch of full fledged “Service Profiles” with different configurations settings and applied them to the servers; also you can configure the fabric interconnect ports with different options (LAN uplink/Server uplink/FC/FCoE/NAS/Port Channels etc)

UCSPE comes in “OVA” and “VMware VMX/VMDK” format (in a ZIP file); you can run it on VMware Workstation or Fusion (I use Fusion).

The pre-defined OVA/VMX requires only one (01) vCPU, 1024MB memory and 3 x vNICs.

You need 3 x IP address (could be from DHCP or static) to make it accessible – one IP address is for Fabric Interconnect “A”, second one is for Fabric Interconnect B and the third IP address is the VIP of FI-A and FI-B Cluster.

Thats all! Its a great tool for candidates learning towards CCNA/CCNP/CCIE Data Centre certifications as well.

UCSPE download link is the following:

Following are few screenshots:


[UCSPE VM console]


[UCSPE devices list. They covered a lot devices in here! The red arrow is showing the icon of UCSM – you need to click on this to launch UCSM]


[UCSM – Login Screen]


[UCSM- The Topology]


[UCSM- FI “A”]


[UCSM – UCS 5108 Chassis]


Cisco UCS 5108 Chassis Power Policy Options and Redundancy Demystify

Cisco UCS server chassis 5108 comes with three (03) different Power Policy options; UCS power management is efficient and energy saving by default but NONE of the policy option explanation says which PSUs will be full ON and which PSUs will be put on to Power Saving Mode. This is what official Cisco document says about these options:

  • Non Redundant – All installed power supplies are turned on and the load is evenly balanced. Only smaller configurations (requiring less than 2500W) can be powered by a single power supply.
  • N+1 – The total number of power supplies to satisfy non-redundancy, plus one additional power supply for redundancy, are turned on and equally share the power load for the chassis. If any additional power supplies are installed, Cisco UCS Manager sets them to a “turned-off” state.
  • Grid – Two power sources are turned on, or the chassis requires greater than N+1 redundancy. If one source fails (which causes a loss of power to one or two power supplies), the surviving power supplies on the other power circuit continue to provide power to the chassis.

Probably the above definition applied to old version UCS and not the latest. Also I couldn’t find details on what exactly happen on power redundancy when you have 2 x PSUs and 4 x PSUs installed and your power load is not very high due to not all the blades are installed and functional.


Following are what I captured regarding which PSUs will be ON and PSUs will be put on to Power-Saving-Mode when you set different “Power Policy” options – this is done a four (04) PSU server chassis. Changing in Power Policy takes effect immediately (might be a 10-20 second delay to refresh the Web GUI) and it doesn’t require any system reboot.

Assumption is all the four (04) PSUs are installed and connected to power socket; also the chassis is running 2 blades.

When Power Policy is set N+1, this is what happen:

PSU3 – OFF (Power Saving Mode)
PSU4 – OFF (Power Saving Mode)

When Power Policy is set GRID, this is what happen:

PSU2 – OFF (Power Saving Mode)
PSU4 – OFF (Power Saving Mode)

When Power Policy is set Non-Redundant, this is what happen (only one is ON! not ALL):

PSU2 – OFF (Power Saving Mode)
PSU3 – OFF (Power Saving Mode)
PSU4 – OFF (Power Saving Mode)

Data centre racks are equipped with two power rails – A (left hand side) & B (right hand side) for redundancy. Now interesting thing is – your physical power connection must be in-line with the UCS Power Policy options, otherwise your blades will be rebooted in case any power issue on “power rail A”.

You can have only two different combinations of power connections to connect all the four (04) PSUs to the power rails A & B. The combinations are following –

Option 1:
PSU1/PSU2 (first two) connected to > power rail A
PSU3/PSU4 (last two) connected to > power rail B

Option 2:
PSU1/PSU3 (odd numbers) connected to > power rail A
PSU2/PSU4 (even numbers) connected to > power rail B

I had N+1 configured with “Option 1” and power issue due to maintenance on rail A rebooted my blades!

This is what happen as following – either “you are saved” during a power failure on rail A! or “you are not saved!

GRID Mode with (Option 1) PSU1/PSU2 to A & PSU3/PSU4 to B;You are saved!
PSU2 – A OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

GRID Mode with (Option 2) PSU1/PSU3 to A & PSU2/PSU4 to B;You are NOT saved
PSU2 – B OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Screenshot of GRID mode following –


N+1 Mode (Option 1) PSU1/PSU2 to A & PSU3/PSU4 to B;You are NOT saved!
PSU3 – B OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

N+1 Mode (Option 2) PSU1/PSU3 to A and PSU2/PSU4 to B;You are saved!
PSU3 – A OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Screenshot of N+1 mode following –


Non-Redundant Mode (Option 1) PSU1/PSU2 to A & PSU3/PSU4 to B;NOT saved!
PSU2 – A OFF (Power Saving Mode)
PSU3 – B OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Non-Redundant Mode (Option 2) PSU1/PSU3 to A & PSU2/PSU4 to B;NOT saved!
PSU2 – B OFF (Power Saving Mode)
PSU3 – A OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Screenshot of Non Redundant mode following –