VMware ESXi Host Memory Management, Monitoring, Alert Notification – Part 2

I have described memory monitoring and alert notification gauge on the previous article (Part 1) – let’s do the configuration.

There are so many ways to monitor and get alert notification of VMWare ESXi host memory usage status – most of well-known monitoring solutions come with VMware monitoring plugins pre-installed. vCenter server can also send alerts based on given conditions as well.

Here I will discuss how to configure Nagios Core to monitor memory usage and alert notification; NagiosXI (the commercial edition has a built-in nice easy web UI to do the same). Before moving forward, make sure Nagios server up and running – we need install the following software/tools on the Nagios server –

i. VMware vSphere Perl-SDK; the version should match to the vCenter/ESXi host version – version 5.5 can be download at https://developercenter.vmware.com/web/sdk/55/vsphere-perl
ii. Download and install check_vmware_esx.pl (this is a fork of check_vmware_api.pl) from https://www.monitoringexchange.org/inventory/Check-Plugins/Virtualization/VMWare-%2528ESX%2529/check_vmware_esx.pl—a-fork-of-check_vmware_api.pl-%2528check_esx3-pl%2529 or from https://github.com/BaldMansMojo/check_vmware_esx/blob/master/check_vmware_esx.pl
iii. Install the required Perl modules.

(Step 1 – install VMware vSphere Perl-SDK)

#tar zxvf  VMware-vSphere-Perl-SDK-5.5.0-1384587.x86_64.tar.gz
#cd vmware-vsphere-cli-distrib
#./vmware-install.pl

Accept the license agreement and install with default settings.

If the installation detect missing or old Perl modules – install them; easiest way is install them via CPAN.

(Step 2 – install & configure check_vmware_esx.pl Nagios check script)

Download this from the above mentioned web sites. Copy the “chech_vmware_esx.pl” script to Nagios libexec directory “/usr/local/nagios/libexec/”; make sure it is owned by “nagios” user/group with executable permission.

If you download the “check_vmware_esx_0.9.19.tgz” file – the installation process is following –

#tar zxvf check_vmware_esx_0.9.19.tgz
#cd check_vmware_esx_0.9.19
#cp check_vmware_esx.pl /usr/local/nagios/libexec
#chown nagios.nagios check_vmware_esx.pl
#chmod 751 check_vmware_esx.pl

Copy the perl modules within “check_vmware_esx_0.9.19/modules” to a directory – this can be inside “/usr/local/nagios/libexec” directory –

#mkdir /usr/local/nagios/libexec/vmware_modules
#cp –R /tmp/check_vmware_esx_0.9.19/modules /usr/local/nagios/libexec/vmware_modules/ 
#chown –R nagios.nagios /usr/local/nagios/libexec/vmware_modules

Also change following parameter in the check_vmware_esx.pl file –

use lib “modules”;
to
use lib /usr/local/nagios/libexec/vmware_modules/modules;

Again if the script execution complain about missing Perl module – install them via CPAN.

You should use a “session lock file” to minimize auth log entries on vCenter or ESXi host; every time nagios execute service check with this script – this will create auth log entries in vCenter/ESXi host – it’s huge! The default script will ask you to create session lock file in “/var/nagios_plugin_cache/” directory – create this directory and make sure it is owned by Nagios.

#mkdir /var/nagios_plugin_cache
#chown –R nagios.nagios /var/nagios_plugin_cache

You need to create an user account for this nagios script on your vCenter or on ESXi hosts you want to monitor. You should use “authfile”; this file contains Nagios monitoring user account/password created on vCenter or ESXi host.

#vi /usr/local/nagios/libexec/vmware_plugin/authfile

Enter the following –

username=nagios_userName_on_esxi
password=password_nagios

#chown nagios.nagios /usr/local/nagios/libexec/vmware_plugin/authfile

At this stage the script should be ready to execute! If not – it must be missing Perl modules :(.

(Step 3 – configure Nagios commands and service check)

This script is capable of monitoring lots of other vCenter objects such as cpu, network, datastore, virtual machines etc. Follow standard Nagios guidelines to create your check commands and service checks.

Usage:

To see all memory parameters of an esxi host–
./check_vmware_esx.pl -H 192.168.1.1 -f /location/of/authfile -S mem

mem usage=42.73% - consumed memory=24501.48 MB - swap used=35.87 MB - overhead=650.41 MB - memctl=0.00 MB: |'mem_usage'=42.73%;;;; 'consumed_memory'=24501.48MB;;;; 'mem_swap'=35.87MB;;;; 'mem_overhead'=650.41MB;;;; 'mem_memctl'=0.00MB;;;;

Set alert notification based on % of memory usage of an esxi host-
./check_vmware_esx.pl -H 192.168.1.1 -f /location/of/authfile -S mem -s usage

mem usage=42.73%|'mem_usage'=42.73%;;;;

./check_vmware_esx.pl -H 192.168.1.1 -f /location/of/authfile -S mem -s usage -w 40% -c 60%

Warning! mem usage=42.69%|'mem_usage'=42.69%;40;60;;

Set alert notification based on MB of total memory usage of an esxi host–
./check_vmware_esx.pl -H 192.168.1.1 -f /location/of/authfile -S mem -s consumed

consumed memory=24501.29 MB|'consumed_memory'=24501.29MB;;;;

./check_vmware_esx.pl -H 192.168.1.1 -f /location/of/authfile -S mem -s consumed -w 24000 -c 26000

Warning! consumed memory=24475.05 MB|'consumed_memory'=24475.05MB;24000;28000;;

To see swap memory usage only of an esxi host–
./check_vmware_esx.pl -H 192.168.1.1 -f /location/of/authfile -S mem -s swapused

swap used=35.87 MB|'mem_swap'=35.87MB;;;;

Screenshot of mem usage on Nagios web UI –

nagios-esxi-memcheck

This script also generate Nagios perfdata which is useful for graphing; if you have pnp4nagios graph installed you should be able to get graph like the following –

nagios-mem-graph

VMware ESXi Host Memory Management, Monitoring, Alert Notification – Part 1

When it comes VMware memory monitoring – two items to monitor (i)ESXi host memory (ii)VM memory. There are bunch of memory related terminologies and calculations here in this space. I am discussing host memory monitoring here –

-understand physical memory usage monitoring
-what is the right memory counter to monitor & alert notification for esxi host
-what is the right gauge of memory monitoring & alert notification for esxi host

Will also setup Nagios check plugin to monitor the above with performance data for graph (Part 2).

Before moving forward; let’s have a look into Mem.MinFreePct function. This function manage how much host memory should be kept free and when the hypervisor should kick-off advanced memory reclamation techniques such as ballooning, compression, swapping.

(Configuration> Advanced Settings>Mem)
memminfreepct

Based on free host memory & reclamation techniques – there are four (04) different states of host memory utilization;

State Name Mem Reclamation Technique Good or Bad Note
High At this state “Transparent Page Sharing” is will be always running. This is default behaviour. Good – this is normal This is defined by Mem.MinFreePct function. Don’t disable TPS – not recommended.
Soft At this state host will activate memory ballooning. Not good enough This is 64% of Mem.MinFreePct. This means physical memory near to max out.  If host unable to go back to previous state itself – take necessary action to free up more mem.
Hard At this state host will start doing memory compression and hypervisor level swapping. Bad – memory under stress This is 32% of Mem.MinFreePct. Need to free up memory by migrating VMs to other hosts or upgrade memory.
Low At this state host will no more serve any page to VMs. Very Bad – fix it ASAP This is 16% of Mem.MinFreePct. This protects host VMkernel layer from Purple Screen of Death.

Prior to ESXi-5.x this (high state) was set to 6% by default – this means host system will always keep 6% of total physical memory free before activate advanced memory reclamation technique; let’s say an ESXi-4.x host with 64GB memory will be required at least 3.84GB free to be in the High state (normal).

Starting from ESXi-5.x this calculation is no more 6% by default – because high memory servers (512GB/768GB) are becoming common these days; 6% of 512GB is 30.72GB its huge free memory.

The new calculation is following –

Free Memory Threshold Range Calculation Note
6% First 0GB to 4 GB 6% of 4GB
4% Starting from 4GB to 12GB (12-4=8) 4% of 8GB
2% Starting from 12GB to 28GB (28-12=16) 4% of 16GB
1% Remaining memory i.e. 36GB if total size is 64GB (64-28=36)
i.e. 68GB if total size is 96GB (96-28=68)

Based on above – on a system with 128GB memory, the min free memory required to be in “high state” calculation is following –

i. 6% of first 4GB – this is 245.76MB (first 0-4GB)
ii. 4% of 8GB – this is 327.68MB (0-4GB|4-12GB)
iii. 2% of 16GB – this is 327.68MB (0-4GB|4-12GB|12-28GB)
iv. 1% of 100GB – this is 1024MB (0-4|4-12|12-28|28-128GB)
v. Total is 1925.12MB (245.76+327.68+327.68+1024).

esxmemfree

Based on the above we can setup monitoring & alert notification for a 128GB host as following –

Mem State Min Free Mem Monitoring Action Calculation
High 1925.12MB No action required Based on above
Soft 1232.0768MB Warning alert 64% of Mem.MinFreePct
Hard 616.384MB Critical alert 32% of Mem.MinFreePct
Low 308.0192MB Critical alert 16% of Mem.MinFreePct

Also at “Hard” state – memory performance measurement counter “Swap used” will be greater than 0. This condition also should trigger alarm.

vmware-perf-mem

esxtop-mem
(esxtop – memory high state)

References:
http://blogs.vmware.com/vsphere/2012/05/memminfreepct-sliding-scale-function.html

 

Adding a new disk ONLINE to a Linux VM running on VMware (no reboot required)

Adding a new disk ONLINE on a virtual Linux server is easy as adding disk to a Windows 2008/2012 Server online! No reboot required.

Make sure the following software already installed on your Linux VM.

-VMware Tools (for other hypervisor install the guest plugin on the VM)

-sg3_utils

-lsscsi

In this example I used RedHat/CentOS running on VMware.

Technical procedures are following –

Before we began let’s see how many disks are currently provisioned on the Linux VM. To do this execute “#lsscsi” command; screenshot –

In this example the server currently have three disks (03) installed.

Linux-VM-Disk-1

Now add a new disk to the VM through vSphere client. Execute the same “#lsscsi” command – the newly disk will not appear!

To get this newly added disk recognized by the Linux system we need to do “rescan SCSI bus”. Usually “SCSI bus” rescan happen every time when the machine gets rebooted – however this time we don’t want to reboot the system!

Execute the following command to rescan “scsi bus” –

# /usr/bin/rescan-scsi-bus.sh -l

(this script is a part of sg3_utils)

You should be able to see the newly added disk on the command output. Screenshot –

Linux-VM-Disk-2

Now if you do a “#lsscsi” this will display four (04) disks. Previously it was three (03) disks in this example.

The new disk information will appear in “dmesg” as well; do a “dmesg | grep disk” to find details.

Next step should be partition the new disk, create file system and provide a mount point; if you want auto mount then add the partition details to “/etc/fstab”.