When it comes VMware memory monitoring – two items to monitor (i)ESXi host memory (ii)VM memory. There are bunch of memory related terminologies and calculations here in this space. I am discussing host memory monitoring here –
-understand physical memory usage monitoring
-what is the right memory counter to monitor & alert notification for esxi host
-what is the right gauge of memory monitoring & alert notification for esxi host
Will also setup Nagios check plugin to monitor the above with performance data for graph (Part 2).
Before moving forward; let’s have a look into Mem.MinFreePct function. This function manage how much host memory should be kept free and when the hypervisor should kick-off advanced memory reclamation techniques such as ballooning, compression, swapping.
Based on free host memory & reclamation techniques – there are four (04) different states of host memory utilization;
|State Name||Mem Reclamation Technique||Good or Bad||Note|
|High||At this state “Transparent Page Sharing” is will be always running. This is default behaviour.||Good – this is normal||This is defined by Mem.MinFreePct function. Don’t disable TPS – not recommended.|
|Soft||At this state host will activate memory ballooning.||Not good enough||This is 64% of Mem.MinFreePct. This means physical memory near to max out. If host unable to go back to previous state itself – take necessary action to free up more mem.|
|Hard||At this state host will start doing memory compression and hypervisor level swapping.||Bad – memory under stress||This is 32% of Mem.MinFreePct. Need to free up memory by migrating VMs to other hosts or upgrade memory.|
|Low||At this state host will no more serve any page to VMs.||Very Bad – fix it ASAP||This is 16% of Mem.MinFreePct. This protects host VMkernel layer from Purple Screen of Death.|
Prior to ESXi-5.x this (high state) was set to 6% by default – this means host system will always keep 6% of total physical memory free before activate advanced memory reclamation technique; let’s say an ESXi-4.x host with 64GB memory will be required at least 3.84GB free to be in the High state (normal).
Starting from ESXi-5.x this calculation is no more 6% by default – because high memory servers (512GB/768GB) are becoming common these days; 6% of 512GB is 30.72GB its huge free memory.
The new calculation is following –
|Free Memory Threshold||Range||Calculation Note|
|6%||First 0GB to 4 GB||6% of 4GB|
|4%||Starting from 4GB to 12GB||(12-4=8) 4% of 8GB|
|2%||Starting from 12GB to 28GB||(28-12=16) 4% of 16GB|
|1%||Remaining memory||i.e. 36GB if total size is 64GB (64-28=36)
i.e. 68GB if total size is 96GB (96-28=68)
Based on above – on a system with 128GB memory, the min free memory required to be in “high state” calculation is following –
i. 6% of first 4GB – this is 245.76MB (first 0-4GB)
ii. 4% of 8GB – this is 327.68MB (0-4GB|4-12GB)
iii. 2% of 16GB – this is 327.68MB (0-4GB|4-12GB|12-28GB)
iv. 1% of 100GB – this is 1024MB (0-4|4-12|12-28|28-128GB)
v. Total is 1925.12MB (245.76+327.68+327.68+1024).
Based on the above we can setup monitoring & alert notification for a 128GB host as following –
|Mem State||Min Free Mem||Monitoring Action||Calculation|
|High||1925.12MB||No action required||Based on above|
|Soft||1232.0768MB||Warning alert||64% of Mem.MinFreePct|
|Hard||616.384MB||Critical alert||32% of Mem.MinFreePct|
|Low||308.0192MB||Critical alert||16% of Mem.MinFreePct|
Also at “Hard” state – memory performance measurement counter “Swap used” will be greater than 0. This condition also should trigger alarm.