ESX VM Health
MW7E2C Description: Monitors the health of each VM as reported by VMTools. How it works: Automatically applies to ESX hosts and adds each guestwith VMWare tools installed as an instance. Polls for the status of the VMWare tools service, the vm's heartbeat, configuration issues, and up/down state. Doesn't alert on anything unless the VM is up. Appears to be quite reliable for detecting VM's that are unresponsive even though all other metrics appear normal. Alerts aren't triggered until after at least two datapoints to prevent false-positives. No graphs for this as it's pretty straightforward. Since VMWare reports health rollup stats as colors (green, yellow, red, gray), this script converts those colors to green:1 - Yellow:2 - red:3 - gray:0 Alerting: Alert subjects are customized to be very clear. Body of the alert email explains what each threshold means in case there's ever any confusion. Notes: This will likely only work on esxi 5.1 and up. Make sure the host has valid credentials set for the esx.user and esx.pass parameters.3Views1like2CommentsHow support troubleshoots ESX connections.
These are some simple troubleshooting steps I use when dealing with ESX servers. LogicMonitor has debug tools that can be run in the debug window on the collector the ESX currentlyassigned collector. The first useful tool is !http. This simply sends a HTTP request to a host and print the response. The ESX API has a few pages we can use that DOES NOT require authentication. This is helpful to test a connection outside of credential issues. For example the below debug command returns “The Web Services Description Language (WSDL) file containing definition of the VMware Infrastructure Management API.” !http https://10.73.42.10/sdk/vim.wsdl What data is returned isn't important, what this command will tell us is can the collector connect to the ESX device or is network infrastructure somehow stopping communication. The next command is !esx and it's a bit more powerful help !esx !esx: query a list of esx performance counter against the given host and print the result usage: !esx [username=foo password=bar] <host> <entityName> <entityType[host|vm|datastore|cluster|resourcepool|hoststatus|cpu|memory|disk|network]> [counter1 [counter2...]] If you don't give the username/password, the agent will use esx.user/esx.pass properties of the host. !esx is a debug tool that allows us to query the VMware API directly in the same way the datasources poll data. To decode the help example let’s run this on the ESX server 10.73.42.10 and the virtual machine “marvin”. The example !esx command is "!esx vc-server esx-name host cpu.usage.average mem.consumed.average" Broken down for the test environment "!esx 10.73.42.10 marvin vm cpu.usage.average mem.consumed.average" If you don't give the username/password, the agent will use the esx.user/esx.pass properties of the host. This is a fantastic way to test the credentials entered into LogicMonitor. You could also push the credentials by using the username= and password= options with the !esx command to verify they work with LogicMonitor. So far we have only tested connectivity which is the most common form of ESX troubleshooting. We can also use the !esx to query individual datapoints in the datasources to ensure the data presented by LogicMonitor is accurate. The command can be built by viewing the datapoint in question. For this example we can use the Cpu Usage counter used in previous examples. Lets take another look at the !esx usage usage: !esx [username=foo password=bar] <host> <entityName> <entityType[host|vm|datastore|cluster|resourcepool|hoststatus|cpu|memory|disk|network]> [counter1 [counter2...]] We know the host is the ESX server10.73.42.17, Entity Name is the Virtual Machine Marvin, EntityType can be found in the datapoint which is "VM" and the ESX counter is cpu.usage.average. !esx 10.73.42.10 marvin vm cpu.usage.average cpu.usage.average which will return the value cpu.usage.average=211.07Views0likes0Comments