Cisco UCS 5108 Chassis Power Policy Options and Redundancy Demystify

Cisco UCS server chassis 5108 comes with three (03) different Power Policy options; UCS power management is efficient and energy saving by default but NONE of the policy option explanation says which PSUs will be full ON and which PSUs will be put on to Power Saving Mode. This is what official Cisco document says about these options:

  • Non Redundant – All installed power supplies are turned on and the load is evenly balanced. Only smaller configurations (requiring less than 2500W) can be powered by a single power supply.
  • N+1 – The total number of power supplies to satisfy non-redundancy, plus one additional power supply for redundancy, are turned on and equally share the power load for the chassis. If any additional power supplies are installed, Cisco UCS Manager sets them to a “turned-off” state.
  • Grid – Two power sources are turned on, or the chassis requires greater than N+1 redundancy. If one source fails (which causes a loss of power to one or two power supplies), the surviving power supplies on the other power circuit continue to provide power to the chassis.

Probably the above definition applied to old version UCS and not the latest. Also I couldn’t find details on what exactly happen on power redundancy when you have 2 x PSUs and 4 x PSUs installed and your power load is not very high due to not all the blades are installed and functional.

Cisco-UCS-PowerPolicy-1

Following are what I captured regarding which PSUs will be ON and PSUs will be put on to Power-Saving-Mode when you set different “Power Policy” options – this is done a four (04) PSU server chassis. Changing in Power Policy takes effect immediately (might be a 10-20 second delay to refresh the Web GUI) and it doesn’t require any system reboot.

Assumption is all the four (04) PSUs are installed and connected to power socket; also the chassis is running 2 blades.

When Power Policy is set N+1, this is what happen:

PSU1 – ON
PSU2 – ON
PSU3 – OFF (Power Saving Mode)
PSU4 – OFF (Power Saving Mode)

When Power Policy is set GRID, this is what happen:

PSU1 – ON
PSU2 – OFF (Power Saving Mode)
PSU3 – ON
PSU4 – OFF (Power Saving Mode)

When Power Policy is set Non-Redundant, this is what happen (only one is ON! not ALL):

PSU1 – ON
PSU2 – OFF (Power Saving Mode)
PSU3 – OFF (Power Saving Mode)
PSU4 – OFF (Power Saving Mode)

Data centre racks are equipped with two power rails – A (left hand side) & B (right hand side) for redundancy. Now interesting thing is – your physical power connection must be in-line with the UCS Power Policy options, otherwise your blades will be rebooted in case any power issue on “power rail A”.

You can have only two different combinations of power connections to connect all the four (04) PSUs to the power rails A & B. The combinations are following –

Option 1:
PSU1/PSU2 (first two) connected to > power rail A
PSU3/PSU4 (last two) connected to > power rail B

Option 2:
PSU1/PSU3 (odd numbers) connected to > power rail A
PSU2/PSU4 (even numbers) connected to > power rail B

I had N+1 configured with “Option 1” and power issue due to maintenance on rail A rebooted my blades!

This is what happen as following – either “you are saved” during a power failure on rail A! or “you are not saved!

GRID Mode with (Option 1) PSU1/PSU2 to A & PSU3/PSU4 to B;You are saved!
PSU1 – A ON
PSU2 – A OFF (Power Saving Mode)
PSU3 – B ON
PSU4 – B OFF (Power Saving Mode)

GRID Mode with (Option 2) PSU1/PSU3 to A & PSU2/PSU4 to B;You are NOT saved
PSU1 – A ON
PSU2 – B OFF (Power Saving Mode)
PSU3 – A ON
PSU4 – B OFF (Power Saving Mode)

Screenshot of GRID mode following –

Cisco-UCS-PSU-GRID-Pic1

N+1 Mode (Option 1) PSU1/PSU2 to A & PSU3/PSU4 to B;You are NOT saved!
PSU1 – A ON
PSU2 – A ON
PSU3 – B OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

N+1 Mode (Option 2) PSU1/PSU3 to A and PSU2/PSU4 to B;You are saved!
PSU1 – A ON
PSU2 – B ON
PSU3 – A OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Screenshot of N+1 mode following –

Cisco-UCS-PSU-N1-Pic1

Non-Redundant Mode (Option 1) PSU1/PSU2 to A & PSU3/PSU4 to B;NOT saved!
PSU1 – A ON
PSU2 – A OFF (Power Saving Mode)
PSU3 – B OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Non-Redundant Mode (Option 2) PSU1/PSU3 to A & PSU2/PSU4 to B;NOT saved!
PSU1 – A ON
PSU2 – B OFF (Power Saving Mode)
PSU3 – A OFF (Power Saving Mode)
PSU4 – B OFF (Power Saving Mode)

Screenshot of Non Redundant mode following –

Cisco-UCS-PSU-NonRedundant-1.jpg

Ubiquiti UniFi @ My Home

I have been using Ubiquiti UniFi solutions last 10 months at my home NBN and wireless network – so far a happy customer; I should have got this solution even earlier! I tried few high-end dual-band wireless modems with triple MIMO antennas and with AC connection speed – Ubiquiti UniFi solutions are heaps better than those.

Although Ubiquiti UniFi solutions are not labelled suitable for “home users” – but they are not too hard to setup. I brought at least a dozen UniFi home customers to Uniquiti – they should send me free UniFi hehe!

Some people might argue that Ubiquiti UniFi are “overkill” as a home wireless solution; however, considering the price and tons of features offered – Ubiquiti UniFi is definitely a solution worth looking. If you have a multi-storied house and having poor wireless coverage issue or having no mobility from one access point to another and you want to see who is doing what – then Ubiquiti UniFi is the right solution for you.

I am not underkilling Ubiquiti UniFi as enterprise solution! It comes with loads of excellent enterprise features – no question on that. I would rather say – I am lucky enough to have access to enterprise solutions for my home network! Hope Ubiquiti don’t get acquired by competitors.

A complete Ubiquiti UniFi solution for your home should includes the following –

i. 1 x UniFi Security Gateway (USG); I deployed the USG which can handle 3Gbps bandwidth with advanced routing, stateful firewall, monitoring (deep packet inspection) and VPN. My NBN connection terminates here. It’s a fan-less device that I never had to reboot for non-performance in last 10 months. Price tag in Australia is $165 for the USG.

USG-PIC-JPEG-1

Details here – https://dl.ubnt.com/datasheets/unifi/UniFi_Security_Gateway_DS.pdf

ii. 1 x UniFi Cloud Key; its the PoE powered micro-computer running a Linux and apps that manages everything! This is also acting as the “Wireless LAN Controller (WLC!)”.  Price tag in Australia is $115.

UniFi-CloudKey-02

Details here – https://dl.ubnt.com/datasheets/unifi/UniFi_Cloud_Key_DS.pdf

iii. the last one is UniFi Access Points – this could be the AC series or the HD series or mix and match; there is NO number limit of “how many” APs can be managed by a single Cloud Key. Hope your doesn’t need support for 1000+ clients!! thats the limit.

You can have 1 x AP at each floor or 1 x east side and 1 x west side; even a single AP is also good enough to cover moderate big house with two floors. Price tag in Australia for the UAP-AC-PRO model (AC1750 each AP with 3×3 MIMO and dual band, PoE powered) is around $200 per AP. You can put them on the top any table or top of any shelve or showcase – it’s an excellent looking kit with blue LED circle light turned on.

UniFi-UAP-AC-PRO-01

Details here – https://www.ubnt.com/unifi/unifi-ap-ac-pro/

iv. 1 x UniFi PoE switch; UniFi switches are amazing fully managed layer 2 gigabit ethernet switch! you can see what is happening per ports, can do VLANs, trunks, jumbo frames, STP! I use the 8-ports PoE model. Price tag in Australia for US-8-60W-AU is $170. You can have the 24-port model if you need higher port density.

UniFi-8Port-PoE-01

Details here – https://www.ubnt.com/unifi-switching/unifi-switch-8/

Total solution cost with a single AP solution is AUD $650 (165+115+200+170) with 2 x APs is AUD $850.

Apart from the above devices you can add CCTV cameras and VoIP telephones to the same solution with centralised control.

Here are few key features that impressed me about the UniFi solution –

i. Fully compatible with Australian NBN; I tested connectivity with TPG and Optus; my friends are using with Aussie Broadband and IINET.
ii. Centralised fully cloud based management portal, excellent user interface, excellent Andriod and iPhone App!
iii. No subscription fees for the Cloud App or Mobile App.
iv. Loaded a lots of enterprise security features such as management frame protection, advanced encryptions etc; also Ubiquiti constantly releasing firmware updates and patches to keep up with latest security threats
v. Deep packet inspection, traffic monitoring, client monitoring, ports statistics – you can see what is happening and who is doing what!
vi. Wireless site survey
vii. Built-in DDNS & DNS-O-Matic support
viii. Plug-n-Play AP addition to the network anytime
ix. Single SSID across the whole wireless network and seamless wireless client handoff from one AP to another AP
x. L2TP/IPSec VPN support
xi. Guest access points & wireless access portal for guests
xii. Block/Unblock a client device using a single tap; good for parenting!
xiii. Easy single tap firmware upgrade through Mobile App or Cloud App/Portal
xiv. Single tap wireless speed testing built-in
xv. Ready for IoT integration

Here are few screenshots –

UniFi-Ctlr-01-JPEG

UniFi-Ctlr-02-JPEG

UniFi-Ctlr-3-JPEG

[following is from UniFi mobile app]

Unifi-MobileApp-JPEG

F5 iRule with Data Group

I often implement large list of IP and URL whitelisting/HTTP header based controls on F5 using iRules and Data Groups. What I found is “Data Groups” are one of the easiest way to handle a large number of matching keys and values!  As per F5 official documents – data group is the simplest way to maintain a list of permanently matched keys and values. A large list IP address or URL or any string/integer matching can be easily fit within iRule using data groups. Add/Remove entries within data groups are super easy.

Some old documents mentioned that data groups have impact on performance – truth is since TMOS v10.x and above data groups have minimum performance impact. F5 did some testing on performance using data groups and here’s some of the results (copied from F5 site):

The testing was done using 10,000 CPS, 1 HTTP request per TCP connection.

Baseline: TCP + HTTP profile + Blank iRule (when RULE_INIT { } )

Baseline results: Total CPU % used = 23%; TMM CPU % used = 16%; TMM Mem (MB) = 254

Check using iRule that I used in the video against datagroup with 1,000 entries (and no match found, so the entire group is forced to be searched).

Results: Total CPU % used = 25%; TMM CPU % used = 18%; TMM Mem (MB) = 259

Check against datagroup with 2,000 entries (no match found).

Results: Total CPU % used = 25%; TMM CPU % used = 18%; TMM Mem (MB) = 261

Check against datagroup with 10,000 entries (no match found).

Results: Total CPU % used = 26%; TMM CPU % used = 18%; TMM Mem (MB) = 277

So, even with a datagroup with 10,000 entries, the performance hit is minimal.

Here is an example of how to use data groups within iRules; lets say – to whitelist a list of IP address and block requests at TCP layer (TCP three-way handshake which happens before HTTP) – (i)we will create two test data groups and (ii)then bundle them together in an iRule and (iii)finally apply to a virtual server.

i. create data group A – iRules > Data Group List – create “test_data_group_A”; set type Address; enter all the IP address or you can import list file or add them to “/config/bigip.conf” (reload configuration if you edit the /config/bigip.conf manually). This is what it looks like –

ltm data-group internal /Common/test_data_group_A {
 records {
 1.1.1.1/32 { }
 1.1.1.2/32 { }
 1.1.1.3/32 { }
 2.2.2.1/32 { }
 2.2.2.22/32 { }
 }
 type ip
}

ii. follow the same above and create “test_data_group_B”
iii. create an iRule and put them together to allow requests only from the listed IPs (use “class match” command to refer to a data group) –

when CLIENT_ACCEPTED {
if {[class match [IP::client_addr] equals test_data_group_IP_A] } 
{}
elseif {[class match [IP::client_addr] equals test_data_group_IP_B] }
{} 
else
{drop }
}

iv. Apply the above iRule to a virtual server

Thats all! the above should only allow the IPs listed on the Data Groups.

In case if it is required to add/remove new IPs – then only add them to the data groups; no need to touch the iRule anymore.

 

JUNOS installation and upgrade on SRX and EX platforms

JUNOS installation and upgrade on SRX and EX platforms (standalone, SRX chassis cluster and EX virtual chassis). The reason I am putting this here is – I often re-use these procedures and don’t want to search in Juniper knowledge base every time! Also other people might find his useful.

Step1: Transfer the JUNOS installation file to SRX/EX devices

Transferring JUNOS installation file can be done in many ways.

If the JUNOS system is already up and running on the network I prefer to transfer the installation file via SSH. Just use any standard SSH client to do the transfer (SCP, FileZilla, WinSCP). The destination directory should be a temporary directory, I prefer “/var/tmp/”.

If the JUNOS system is not connected to network or a brand new system, then I prefer to do the installation transfer file via USB stick. Just attach the usb to any of the USB port on the JUNOS system. You need to find out the UNIX device name for the USB on JUNOS, do the following –

>start shell
%dmesg

The above will show the USB device name/number at the end of “dmesg“; most of the cases this is “/dev/da1”; so the first UNIX disk partition (it’s called “slice” – s) within this is /dev/da1s1. Now mount the USB to a temporary directory; I prefer /var/tmp/usb’; so create the usb directory and mount it.

%mkdir /var/tmp/usb
%mount -t msdosfs /dev/da1s1 /var/tmp/usb

Now copy the JUNOS installer from USB to local JUNOS partition /var/tmp (this is optional – installation can be done from the mounted USB directory)

%copy /var/tmp/usb/junos-srxme-15.xxxx.tgz /var/tmp

If you are installing JUNOS from local partition – you can disconnect the USB at this stage.

Step2: Installation JUNOS

If it is a standalone system (SRX, EX or other) – you can go straight to the installation.

If it is a SRX chassis cluster without ISSU – you should install JUNOS on both the device but make sure to REBOOT them together at same time. If you couldn’t afford to have downtime during upgrade – there are few other methods (by disconnecting fabric and control link during installation) and also you might considering upgrade to next level Juniper system that does support ISSU.

Commands are following-

>request system software add /var/tmp/junos-xxxx.tgz no-copy validate ; make sure the installation was successful
>request system reboot ; (reboot both non-ISSU/NSSU SRX together at same time)

If your system support nonstop software upgrade (EX33xx to EX82xx virtual chassis cluster – NSSU), then following are the procedures to perform this (assuming all the VC members are same model)-

a. Copy the JUNOS installation file to /var/tmp on the master switch

b. Make sure you have nonstop active routing (NSR) and graceful routing engine switchover (GRES) are enabled on virtual-chassis; example commands are following (on a two node VC)-

#set chassis redundancy graceful-switchover
#set routing-options nonstop-routing
 
#set virtual-chassis member 0 role routing-engine
#set virtual-chassis member 0 serial-number PEXXXXXX (serial number of the switch 1)
#set virtual-chassis member 1 role routing-engine
#set virtual-chassis member 1 serial-number PEXXXXXX (serial number of the switch 2)

c. The installation command is following –

>request system software nonstop-upgrade /var/tmp/junos-version.tgz

d. Reboot the members

>request system reboot ; this will reboot one member at a time

 

Step3: Back up the JUNOS software to the alternate partition (JUNOS snapshot)

In case the primary partition failed – the JUNOS system will boot from the backup partition (UNIX slice) that has same JUNOS version installed.

>request system snapshot slice alternate ; on standalone SRX or EX
>request system snapshot slice alternate all-members ; on EX virtual chassis

 

Step4: Perform disk cleanup after installation (optional)

I love to perform disk cleanup after JUNOS upgrade.

>request system storage cleanup dry-run ; this will show the files to be deleted
>request system storage cleanup ; this will delete the files

 

Juniper SRX IDP (IDS/IPS) and SCREEN (DoS) Logs to Splunk

Juniper SRX IDP (IDS/IPS) and SCREEN (DoS) logs can be sent to a remote host via Syslog.

You might have come across IT security compliance requirements asking for visibility across your IDP and DoS attack event logs. One of the solution is sending all your security logs to a centralised logging system such as Splunk then perform all the required actions such as creating reports, dashboards and sending alerts from there.

In this example I have documented what are the configuration requirements to send Juniper SRX IDP and SCREEN logs to Splunk via Syslog.

Step 1: Setup Splunk to listen on UDP 514 (Syslog)

Make sure you have a running Splunk. Also you have configured Splunk to listen on UDP port 514 as syslog. This can be done via adding the following onto the file >> “/opt/splunk/etc/system/local/inputs.conf

[udp://514]
sourcetype = syslog

You can install the following Juniper Apps available in the Splunk app store:

-Splunk Add-on for Juniper
-Juniper Networks App for Splunk

If you do not have the above apps installed – you still can create your Splunk dashboards, reports & alerts manually based on the fields within the captured IDP and SCREEN logs.

Make sure SRX firewalls are able to talk to the Splunk server over the network.

Step 2: Setup SCREEN options

Make sure you have implemented SCREEN options. A bunch of options are available for SCREEN; here is some examples:

#set security screen ids-option internet-screen-options icmp ip-sweep
#set security screen ids-option internet-screen-options icmp ping-death
#set security screen ids-option internet-screen-options ip bad-option
#set security screen ids-option internet-screen-options ip spoofing
#set security screen ids-option internet-screen-options ip tear-drop
#set security screen ids-option internet-screen-options tcp syn-fin
#set security screen ids-option internet-screen-options tcp tcp-no-flag
#set security screen ids-option internet-screen-options tcp syn-frag
#set security screen ids-option internet-screen-options tcp port-scan
#set security screen ids-option internet-screen-options tcp syn-ack-ack-proxy
#set security screen ids-option internet-screen-options tcp syn-flood white-list PenTest-TempWhitelist source-address 123.xxx.xxx.xxx/32
#set security screen ids-option internet-screen-options tcp syn-flood white-list PenTest-TempWhitelist source-address 123.xxx.xxx.xxx/32
#set security screen ids-option internet-screen-options tcp land
#set security screen ids-option internet-screen-options tcp winnuke
#set security screen ids-option internet-screen-options tcp tcp-sweep
#set security screen ids-option internet-screen-options udp flood
#set security screen ids-option internet-screen-options udp udp-sweep
#set security screen ids-option internet-screen-options udp port-scan
#set security screen ids-option internet-screen-options limit-session source-ip-based 1000
#set security screen ids-option internet-screen-options limit-session destination-ip-based 1000

Step 3: Enable logging within IDP Rulebase

Make sure you have an active IDP policy and you have also enabled IDP within security policies.

#show security idp active-policy
active-policy Recommended;

The above command shows current active policy “Recommended”; the default “Recommended” policy comes with “then notification log-attacks” along with “action recommended” as following:

then {
 action {
 recommended;
 }
 notification {
 log-attacks;
 }
 }

If you create a custom policy, make sure your policy is configured with “notifications log-attacks“.

Also make sure you have enabled IDP within “security policy”. Following is an example of enabling IDP within a security policy:

#set security policy from-zone sec-zone-source to-zone sec-zone-destination policy name-of-sec-policy then permit application-services idp

Step 4: Setup SRX firewalls to send logs to Syslog

SRX IDP logs are marked with RT_IDP.
SRX SCREEN logs are marked with RT_IDS.

You need to filter logs to capture the above while sending them to a remote syslog server.

#set system syslog host 172.16.xx.10 any any
#set system syslog host 172.16.xx.10 match "RT_IDP|RT_IDS"
#set system syslog host 172.16.xx.10 source-address 172.16.xx.5
#set system syslog host 172.16.xx.10 structured-data brief
#set system syslog file messages any any

Now generate some port scanning towards firewall interfaces where the SCREEN and IDP policies are applied. You can use “https://pentest-tools.com/network-vulnerability-scanning/tcp-port-scanner-online-nmap” to send some quick scan.

You should be able see SCREEN logs as following >

root@firewall-host-name> show log messages | match RT_IDS
Oct 13 14:53:22 firewall-host-name RT_IDS: RT_SCREEN_TCP: TCP port scan! source: 178.79.138.22:39267, destination: 118.xxx.xxx.xxx:990, zone name: sec-zone-internet, interface name: reth0.XXX, action: drop
Oct 13 14:53:43 firewall-host-name RT_IDS: RT_SCREEN_TCP: No TCP flag! source: 178.79.138.22:50779, destination: 118.xxx.xx.xxx:443, zone name: sec-zone-internet, interface name: reth0.XXX, action: drop
Oct 13 14:53:43 firewall-host-name RT_IDS: RT_SCREEN_TCP: SYN and FIN bits! source: 178.79.138.22:50780, destination: 118.xxx.xxx.xxx:443, zone name: sec-zone-internet, interface name: reth0.XXX, action: drop

Following are example of IDP attack event logs >

Oct 13 08:55:55 firewall-host-name 1 2017-10-13T08:55:55.792+11:00 firewall-host-name RT_IDP - IDP_ATTACK_LOG_EVENT [junos@2636.1.1.1.2.135 epoch-time="1507845354" message-type="SIG" source-address="183.78.180.27" source-port="45610" destination-address="118.127.xx.xx" destination-port="80" protocol-name="TCP" service-name="SERVICE_IDP" application-name="HTTP" rule-name="9" rulebase-name="IPS" policy-name="Recommended" export-id="15229" repeat-count="0" action="DROP" threat-severity="HIGH" attack-name="TROJAN:ZMEU-BOT-SCAN" nat-source-address="0.0.0.0" nat-source-port="0" nat-destination-address="172.xx.xx.xx" nat-destination-port="0" elapsed-time="0" inbound-bytes="0" outbound-bytes="0" inbound-packets="0" outbound-packets="0" source-zone-name="sec-zone-name-internet" source-interface-name="reth0.XXX" destination-zone-name="dst-sec-zone1-outside" destination-interface-name="reth1.xxx" packet-log-id="0" alert="no" username="N/A" roles="N/A" message="-"]

Now search in the Splunk with RT_SCREEN for SCREEN logs and IDP_ATTACK_LOG for IDP logs.

Here is few example screenshots from Splunk.

[Screenshot – Official Juniper App from Splunk App Store]

IDP-Splunk-OffcialJuniperApp-2

[Screenshot – IDP_ATTACK_LOG within Splunk]

IDP-Splunk-2

[Screenshot – SCREEN action logs]

IDP-Splunk-3

[Screenshot – Splunk Dashboard IDP Attack Events]

IDP-Splunk-4

The above dashboard has been created with the following search parameter:

IDP_ATTACK_LOG_EVENT 
| rename host as Firewall-Name
| rename attack_name as Attack-Name
| rename threat_severity as Threat-Severity
| rename action as Action
| rename policy_name as IDP-Policy-Name
| rename source_address as Attacker-IP
| rename source_interface_name as Src-Interface
| rename source_zone_name as Src-Security-Zone
| rename destination_address as Dst-Address
| rename destination_interface_name as Dst-Interface
| rename destination_zone_name as Dst-Security-Zone
| rename destination_port as Dst-Port
| rename nat_destination_address as Internal-Dst-NAT-Address
| table Firewall-Name, Attack-Name, Threat-Severity, Action, IDP-Policy-Name, Attacker-IP, Src-Interface, Src-Security-Zone, Dst-Address, Dst-Interface, Dst-Port, Internal-Dst-NAT-Address, Dst-Security-Zone, _time

[Screenshot – Splunk Dashboard SCREEN Attack Events]

IDP-Splunk-5

You can create Splunk “alerts” based on the same above!

AWS “DirectConnect” & “VPC Networking”– from a typical Network Engineer’s perspective

When I started working with AWS DirectConnect few years ago – I was a bit confused about from where to start.

During that time (few years ago & even now), articles related to Cloud Solutions are mostly focused on continuous integration, continuous deployment, configuration automation, high available RDMBS etc, etc, etc; people hardly talk about “networking” on the Cloud. These solutions mostly go to application development & maintenance which has nothings to do with a Network Engineer (well network engineer does coding as well, these are to manage the infrastructure devices and not related to business applications). People hardly talk about networking on the Cloud.

If you are new to the Cloud world – probably you will believe that Cloud infrastructures are built without the help of Network Engineers! – because nobody (most of cloud marketing articles) wants to talk about them; Cloud is all about applications – no networking is required!

Being a network engineer – I was thinking is this the end of world for a network engineer? Actually, this is the beginning. I have seen so many poorly designed VPC networking with lack of security, segregation and control. Well, why this happening? Because the guys built these infrastructures are not experienced Network Engineer; they have computing skills obviously as they are DevOps and Application Developers, it’s like asking suggestions from a skin specialist for heart issues – as they both have same medical bachelor’s degree.

So when I started working on implementing AWS DirectConnect – my mindset was I am going to learn a hell lot of coding and some “brand new” ways to implement data/IP network. I started reading AWS DirectConnect documents supplied by AWS – https://aws.amazon.com/documentation/direct-connect (also VPC networking documents http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/getting-started-ipv4.html). At the end what I found is – its the same old wine in a new bottle and also with a brand new label with few extra utensils to handle it.

Here is the summary of AWS DirectConnect network components & concepts from a regular network engineer’s perspectives and surely these are very “common knowledge” and nothing very new or unknown to worry about.

Part 1: Why use AWS DirectConnect?

Q1. Why we need AWS DirectConnect(s)?
Ans: To have an inter-connect between (a)self-managed infrastructure and (b)resources on AWS such as Virtual Private Cloud, AWS S3; Hybrid cloud is getting popular, companies these days wants to integrate self-managed cloud platform (infrastructure) with AWS or other (Azure has similar network connection offerings).

Other examples are:
-some companies have self-manage back-end systems such DB servers and they want put the front-end application servers on AWS Cloud.
-some companies use DirectConnects to send self-managed data backup to offsite location within AWS S3.
-some companies put DirectConnects to have a high-speed migration of on premise resources off to AWS Cloud during infrastructure migration to AWS.

Q2. Well, my company or client needs AWS DirectConnect. How can I get a direct connect? Where can I get this?
Ans: AWS DirectConnects are now available to many data centres across the globe these day. Check out at AWS web site for availability within your preferred data centre https://aws.amazon.com/directconnect/details/; they might be already having a presence in your data centre.

Q3. OK – I got it. Now I want to connect to AWS DirectConnect. From where to start? What are the available options?
Ans: There you go! First thing is you need to submit a DirectConnect request through AWS Console; based on your request AWS will send you a Letter of Authorization – Connecting Facility Assignment (LOA-CFA) for your cross-connect to your rack the in the data centre; AWS will also allocate a network port/interface for you on their end for this cross-connect. The data centre guys will do the physical cross-connect cable run for you. Please go through question Q4 to Q8 to get an understanding of what type of physical connection you will be ordering.

Starting from here it’s all about the “same old IP networking”.

Part 2: Lets talk about with “Physical Connectivity”

Q4. What are the available physical connectivity options?
Ans: AWS DirectConnect comes in two different physical interface capacity – 1Gbps Ethernet and 10Gbps Ethernet.

Q5. What type of network cable do I need to use?
Ans: For both the 1Gbps and 10Gbps – it is “single mode fibre optic” cable.

Q6. There are so many optical fibre weave lengths and interfaces type; which one is compatible with AWS DirectConnect?
Ans: For the 1Gbps it is 1000BASE-LX (1310nm wavelength signal) and for 10Gbps it is 10GBASE-LR (1310nm wavelength). AWS do not use 1550nm.

Q7. What else need to know about the network interfaces?
Ans: The interface must support IEEE 802.1Q VLAN tagging. Actually you will be creating “tagged” logical interfaces based on the physical interface. This is just like connectivity between two Layer3 switches/router (one is your L3 switch/router – other one is AWS managed) so that you can have many logical VLAN interfaces. If you use layer 2 switch for this connectivity, you must need a router connecting to your switch and you deliver the same VLAN tag number to the router interface (the VLAN tag id you share with AWS).

Q8. I want to have more than one DirectConnect physical interfaces – can I use LAG? What are physical connectivity options to have more than one DirectConnect?
Ans: AWS recently started supporting LAG that use LACP. You can also use L3 ECMP (equal cost multi path – routing) load balancing & link failover with BFD (Bi-Directional Forward Detection for quick link fault detection). It’s your choice and business requirements, which one to choose “L2 LACP” or “L3 ECMP”. AWS LACP LAG is active/active solution.

Key items here:
-Physical interface – 1G and 10G
-Optical cable and connectors – single mode optical fibre; 1000BASE-LX and 10GBASE-LR; optical wavelength 1310nm.
-VLAN tagging 802.1Q
-LACP or ECMP

Part 3: Lets talk about IP Routing – DirectConnect Routing

Q9. Ok – now I know about the physical connectivity; what about the traffic forwarding and routing between AWS and self-managed network?
Ans: Regarding routing – AWS does support “ONLY” BGP. Since you are connecting to AWS which is a different ASN – the BGP type here is EBGP.

Q10. Can I connect to AWS hosted public resources such as S3 and other resources via the DirectConnect?
Ans: Yes off-course; while creating a DirectConnection virtual interface – you have two options to select; either (i)private or (ii)public. Private interface will only allow you access to your private AWS resources sitting within your VPC subnets – whereas public will allow you access to AWS hosted public resources such as S3 and RDS. You need to have routes to AWS public resources via DirectConnect (to the AWS public networks) which are not directly connected to the DirectConnect router; let’s say your edge router is connected to DirectConnect, so your core or dist routers should have routes to those AWS public via the edge router.

Q11. What do I do to see my BGP advertised networks in the VPC?
Ans: You need to create virtual private gateway (VGW) on AWS web admin console and attach it to your VPC. You can have one (01) VGW per VPC; your VGW is just a network routing virtual appliance that manages external IN/OUT traffic routes to your VPC. There is an option called “route propagation” within VPC routing table – turn this on, this will show all the routes propagated via BGP.

Q12. Can I have 2 x DirectConnect working together to have load balancing and redundancy?
Ans: Yes you can. You have two options – (i)one is LAG which use LACP active/active – this is a layer 2 solution, (ii)other option is Layer3 ECMP. Regarding ECMP, by default AWS use ECMP over BGP advertised router to send traffic across all the available active virtual interfaces (from AWS >> to you). Regarding sending traffic from your end to >> AWS, create your own ECMP policies via BGP advertised routes; this can be done using routing policies telling your router/firewall to use all the active virtual interfaces for the same destination (destination is AWS) IP subnets. AWS does support BFD (bidirectional forward detection) to provide fast network fault detection and convergence.

When using ECMP, one important thing is – if your DirectConnect terminating device is non-SPI (stateful packet inspection) packet based router – then they will send/receive packets from multiple interfaces without having any issue. However, if your DirectConnect terminating device is a session based firewall (SPI/stateful firewall) it will drop packets which does not match existing session table entries for return path (as return path might be the other firewall or might be other network interface which might not belongs to the same security zone). If both the AWS DirectConnect interfaces terminates on to a same SPI firewall – then put both the interfaces (one interface is via DirectConnect X, other one is via DirectConnect Y) on to the same security zone; in this case firewall SPI return traffic will get matched in the session table and will have no packet drop; if the packet return path is a different SPI firewall – then you need to turn off SPI on both the firewalls for AWS DirectConnect traffic.

To have redundancy only (no traffic load balance), you can use BGP “AS path prepend” feature to tell AWS BGP peer to send traffic (from AWS >> to your network) via your preferred path only.

I have designed & implemented 4 x DirectConnects connected to the same VPC resources using ECMP.

Q13. I want to advertise only selected internal IP subnets to AWS VPC – can I do this?
Ans: Yes of course. Setup your BGP to advertise selected local IP subnets only. This can be done using route filtering/routing policies. Always check and make sure you are advertising the correct IP subnets and receiving the correct advertised IP subnets.

Q14. Can I send my “Internet” traffic via DirectConnects to Internet?
Ans: AWS does not allow you to route to non-AWS Internet resources via DirectConnect; in other term, you cannot use AWS as an “intermediate AS” to route traffic to Internet.

Key items here:
-BGP and ASN
-AWS DirectConnect virtual interfaces – private interfaces & public interfaces – just like any other layer3 virtual interface where you can assign an IP address and use to route traffic to destinations.
-BGP route export & import, routing policies
-ECMP – equal cost multi path; load balancing and link failover across multiple L3 links
-BFD – provide fast failure detection
-BGP AS path prepend
-AWS VPC – a virtual private network boundary which use a larger CIDR block that you divided into many smaller IP networks.
-AWS Virtual Private Gateway (VGW) – a network routing appliance which manage traffic IN/OUT, static route and BGP dynamic route.
-AWS VPC Routing Table & route propagation – just another routing table; consists of ip subnets as destinations networks and use IGW/NATGW as gateway exit path.

Part 4: Let’s talk about routing within AWS VPC

Q15. I don’t want all my VPC subnets (within AWS) to have route to self-managed network subnets?
Ans: Sure you can do this. First, create “subnets” those you don’t want to a have route to self-managed network > then create “routing table” and make sure “route propagation” is turned off – “associate” the same subnet here. You can do all these from the AWS web admin console.

Q16. I wants to have my VPC resources sitting on more than one AWS availability zones for maximum high availability?
Ans: Sure you can get this done via having distributed IP subnets across multiple availability zones. When creating a subnet, you can specify (i)which VPC the subnet is attached to (ii)set your preferred AWS “availability zone” where the subnet must reside. When creating AWS resources such as EC2, ELB; attach subnets based on availability zone thus provides greater high availability.

Q17. I want to send traffic to Internet from my VPC – how can I do that?
Ans: There are two different ways you can get this done; (i)one is attach an IGW to your VPC subnet routing table – this will enable both inbound and outbound traffic for all the subnets attached to the routing table (ii)if you only want outbound (NATed traffic) – then this can be done via a NATGW (subnet users send traffic to Internet via NAT GW), however the NAT GW needs to send traffic to Internet via an IGW. You should have NAT GW per availability zones.

Key items here:
-AWS VPC subnets
-AWS VPC routing table
-AWS Availability zones
-AWS Internet Gateway (IGW)
-AWS NAT Gateway (NATGW)

Part 5: How to administer/control AWS networking

Q18. What are the available tools to manage/administer AWS networking?
Ans: This is very simple; initially you better use the AWS Web Console, once you get a good visualisation of AWS components/products, then start using AWS command line tool (AWS CLI) and AWS APIs. As a Network Engineer probably you already know a hell lot of command syntaxes, so you will find AWS CLI much easier.

Key items here:
-AWS Web Console
-AWS CLI
-AWS APIs

Junos “flow traceoptions” and managing flow trace “log files”

Junos “flow traceoptions” is the utility to track all routing protocols functionalities such as – how traffic is being traversing from source to destination; how traffic is being traversing from one interface to another; is the traffic able to finds out the correct destination path; what security zones are involved in the traffic path; what security polices are applied; is the traffic getting permitted or getting dropped by a firewall rule; what firewall rules or policies are involved; similar etc.

Three things need to be address while working with flow traceoptions –

  • Need to enable “flow traceoptions” and send the logs to a Flow Trace log file.
  • Analysis the Flow Trace log file to find out the fact what is happening.
  • Make sure to disable flow traceoptions.
  • Once finished with analysis & inspections, cleanup the flow trace log files to maintain available disk space on the Juniper box.

To enable flow traceoptions, following are popular syntaxes-

++++
#set security flow traceoptions file Flow-Trace-LogFile
#set security flow traceoptions flag basic-datapath

#set security flow traceoptions packet-filter PF1 source-prefix 1.1.1.1/32
#set security flow traceoptions packet-filter PF1 destination-prefix 2.2.2.2/32

#set security flow traceoptions packet-filter PF2 source-prefix 2.2.2.2/32
#set security flow traceoptions packet-filter PF2 destination-prefix 1.1.1.1/32
++++

Optionally we can enter the following to set limit to be avoid hammered by huge logs.

+++
#set security flow traceoptions file files 2; maximum 3 log files 0,1,2
#set security flow traceoptions file size 2m; size of each log file is 2MB
+++

The above will create log file “Flow-Trace-LogFile”; to see the log file, enter the following command –

+++
>show log Flow-Trace-LogFile
+++

We once we finished analysis & inspections with the log files – we should disable traceoptions as following-

+++
#delete security flow traceoptions
+++

Lastly to clean-up a log file and also to delete log files – use the following commands.

To clear a log file – enter the following command-

+++
>clear log LogFileName
+++

To delete a log file – enter the following command-

+++
>file delete <path>
>file delete /var/log/flow-trace-logs.0.gz
+++

 

10 handy F5 LTM iRules I often use

These are the few handy (10) F5 LTM iRules I use very often. I am keeping a copy here as my reference and this might help others as well.

 

1. Log all http access headers (client access request & response) – this will send logs to /var/log/ltm.

++++
when HTTP_REQUEST {
   set LogString “Client [IP::client_addr]:[TCP::client_port] -> [HTTP::host][HTTP::uri]”
   log local0. “=============================================”
   log local0. “$LogString (request)”
   foreach aHeader [HTTP::header names] {
      log local0. “$aHeader: [HTTP::header value $aHeader]”
   }
   log local0. “=============================================”
}
when HTTP_RESPONSE {
   log local0. “=============================================”
   log local0. “$LogString (response) – status: [HTTP::status]”
   foreach aHeader [HTTP::header names] {
      log local0. “$aHeader: [HTTP::header value $aHeader]”
   }
   log local0. “=============================================”  
}

+++++

 

2. Log client_ip only (the above example show IP as well) – this will send client_ip address to /var/log/ltm.

+++
when CLIENT_ACCEPTED {
  log “CONNECT: [IP::client_addr]”
}
+++++

 

3. Redirect HTTP to > HTTPS

++++
when HTTP_REQUEST {
if { [string tolower [HTTP::host]] ends_with “.myfqdn.com.au” } {
HTTP::redirect https://www.myfqdn.com.au [HTTP::uri] #no space
}
else {
reject
}
}
+++++

 

4. Allow our DNS host names only – we don’t allow domain names which doesn’t belongs to us. We only accept “mydomain.com.au” and subdomains within it for our virtual servers.

++++
when HTTP_REQUEST {
            if { [string tolower [HTTP::host]] equals “mydomain.com.au” || [string tolower [HTTP::host]] ends_with “.mydomain.com.au” } {
            }
            else {  
                        reject
            }
}
+++++

 

5. If all pool members are down – redirect HTTP Requests to our maintenance web site –

+++++
when HTTP_REQUEST {
if { [active_members [LB::server pool]] == 0 } {
HTTP::redirect “https://maintenance.mydomain.com.au/#no space
}
}
++++++

 

6. If ALL pool member is down, display “site is under maintenance from the F5” from the F5.

++++++
when HTTP_REQUEST {
if { [active_members [LB::server pool]] == 0 } {
HTTP::respond 200 content “<p><h3>This site is currently under maintenance – please try again later.</h3></p>”
}
}
+++++

 

7. If all pool members are down – return 200 OK with content from the F5 –

++++
when HTTP_REQUEST { 
    if { [active_members [LB::server pool]] == 0 } {
        HTTP::respond 200 content “<p><h3>This site is currently under maintenance – please try again later.</h3></p>”
    }
 }
+++++

 

8. URI rewrite – if client try to access “/application” rewrite/send them to “/application/ver1.1”

++++
when HTTP_REQUEST {
    switch [HTTP::uri] {
        “/application” {
          HTTP::uri “/application/ver1.1”
        }
           }
}
+++++++++

 

9. Rewrite URI based on HTTP Header – URI rewrite is transparent to client whereas HTTP::redirect to new address is not which return HTTP code 3xx to client.

+++++
when HTTP_REQUEST {
    switch [HTTP::header X-APP-Version] {
        “app1.0” {
            HTTP::uri “/app/default1.0”
        }
        “app2.0” {
            HTTP::uri “/app/default2.0”
        }
    }
}
++++++

 

10. HTTP redirect based on http header – HTTP redirect 307 preserve what present within a initial POST request whereas other 30x such as 301/302 does not preserve any data in initial POST.

+++++++
when HTTP_REQUEST {
            if { [HTTP::header X-APP-NAME] contains “myapp1”}
            {
                                    HTTP::respond 307 “Location” “https://myapp.abc.com/api/myapp1.0#no space
            }
            else {
                                    HTTP::respond 307 “Location” “https://myapp.abc.com/api/myapp2.0#no space
            }
}
+++++++++

 

MSSQL 2014 AlwaysOn Availability Group Cluster & Gratuitous ARP (GARP) Issue

MSSQL 2014 AlwaysOn cluster running on Windows 2012 R2 doesn’t send Gratuitous ARP (GARP) packets by default!

I have recently come across gratuitous arp (GARP) issues while working on Microsoft SQL 2014 AlwaysOn Availability Group cluster setup. I experienced the following –

  1. MSSQL 2014 AlwaysOn cluster with AlwaysOn Availability Group (AG) setup was done as per best practices and experts recommendations; all cluster related services were running OK without any issue.
  2. clients sitting on the same IP network/same VLAN were able to connect to the AlwaysOn AG listener Virtual IP (VIP) address immediately after a cluster failover happen from Node-A to Node-B and vice versa.
  3. however, clients sitting on different IP subnets were NOT able to connect to the VIP immediately after a cluster failover.
  4. clients sitting on different IP subnets waited for 20MIN to get connect to the VIP.
  5. this 20minutes is MAC address lifetime on the ethernet switch (I use Juniper EX-series switches) where the servers are connected (connected to physical Hypervisor).
  6. on the network layer the switch “ARP table” was showing previously learnt MAC address for the AG Listener VIP; the switch didn’t updated MAC address after a cluster failover triggered. The switch flushed out the old MAC and re-learnt the new correct MAC address after the MAC age time (20min) expired on the switch.

I was looking for a solution and found “GARP Reply” needs to be enabled on the Juniper EX switch manually – I have done that but still NO improvement!

Also looked at Microsoft KB documents and forums – people are saying GARP needs to be turned on the network switch which I have DONE already without any success.

After doing further digging inside I found that the Windows 2012 R2 servers were not sending any GARP packets so the switch was not updating the ARP table although it is configured to work with GARP.

To get this working – Windows server registry object “ArpRetryCount” needs to be added; Microsoft said the following about this –

“Determines how many times TCP sends an Address Request Packet for its own address when the service is installed. This is known as a gratuitous Address Request Packet. TCP sends a gratuitous Address Request Packet to determine whether the IP address to which it is assigned is already in use on the network.”

Add the registry entry as following –

-HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
-REG_DWORD > ArpRetryCount
-Value is between 0-3 (use value 3)

0 – dont send garp
1 – send garp once only
2 – send garp twice
3 – send garp three times (Default Value – actually not present on Windows 2012 R2)

To enable “GARP reply” on Juniper EX & SRX platform – user the following command –

#set interface interface_name/number gratuitous-arp-reply

The interface can be a physical interface, logical interface, interface group, SVI or IRB.

To enable GARP on Cisco IOS – use interface command “ip gratuitous-arps“.

References:
https://technet.microsoft.com/en-us/library/cc957526.aspx
http://www.juniper.net/techpubs/en_US/junos13.2/topics/usage-guidelines/interfaces-configuring-gratuitous-arp.html
http://www.cisco.com/web/techdoc/dc/reference/cli/nxos/commands/l3/ip_arp_gratuitous.html

Juniper SRX – replacement of a node in chassis cluster with IDP installed

One of my chassis cluster node in a SRX cluster was failed. I got a RMA replacement SRX box from Juniper. When I try to put the new device (a brand new SRX) to the existing cluster by transferring existing configurations to the new device as suggested by Juniper KB – it was failed!

The reason for failure was due to IDP attack signature database (Juniper call it IDP security package) installed on the existing running node (on the cluster) – whereas the new node has no IDP installed on it.

I was thinking of some sort of auto IDP signature sync on the new device as a part of transferring the configuration before putting this to the existing cluster – but couldn’t find any solution. So, I had to manually download and install the same IDP security package onto the new SRX transferred from the existing running cluster node along with the existing configurations.

Here is the total procedure (I am keeping this for my own reference to be used in future):

1. First thing first – wipe out all existing configuration on the new RMA SRX & set root authentication. Also make sure the new node is not connected to the cluster.

#delete

#set system root-authentication plain-text-password

#commit

2. Configure chassis cluster on the new node. The cluster ID and node ID must be same as the failed cluster node.

>set chassis cluster cluster-id 1 node 0; here cluster-id is 1 & node number is 0

>request system reboot

3. Download IDP security package from the existing cluster node. Download can be done using SSH/SFTP (you can use FileZilla or WinScp or Mac/Linux scp command) to connect & download the IDP security package.

The attach signature database is located at “/var/db/idpd/sec-download/*“. You can download the whole “sec-download” directory. Once download is done, copy it to an USB stick (should be formatted with FAT32).

4. Transfer & install IDP security package to the new SRX device.

Plugin the USB to the SRX; mount it and copy the content to the same destination folder “/var/db/idpd/sec-download/“.

>start shell

%mkdir /var/tmp/usb

%mount -t msdosfs /dev/da1 /var/tmp/usb

%cd /var/tmp/usb/sec-download

%cp -R * /var/db/idpd/sec-download/

5. Install the IDP security package on the new SRX device.

>request security idp security-package install node 0

>request security idp security-package install status

>request security idp security-package install policy-templates node 0

>request security idp security-package install status

Confirm installation is done successfully (you should see something like following)-

>show security idp security-package-version 

node0:

—————————————————————-

     Attack database version:2660(Tue Mar  1 01:09:02 2016 UTC)

     Detector version :12.6.160151117

     Policy template version :2660

6. Now download the current running configuration from the existing cluster node.

Following command will create a copy of all configuration-

#save /var/tmp/config-backup-ddmmyy

Connect to the running device using FileZilla or similar on to SSH/SFTP port; download the “/var/tmp/config-backup-ddmmyy” file. Transfer the file to USB stick (should be formatted with FAT32).

You should not make any configuration change to the running device at this point.

7. Load the downloaded configuration to the new SRX device via USB.

Plugin the USB to new SRX box.

>start shell

%mount -t msdosfs /dev/da1 /var/tmp/usb

%exit

>config

#load override /var/tmp/usb/config-backup-ddmmyy

#commit

Now power off the new SRX new and get ready to add this to the existing cluster.

>request system power-off

8. Connect all the network cables “same as before”. Power on the new device.

9. Check cluster status – both the nodes should be back online.

>show chassis cluster status

Thats all!