How support troubleshoots ESX connections.
These are some simple troubleshooting steps I use when dealing with ESX servers. LogicMonitor has debug tools that can be run in the debug window on the collector the ESX currently assigned collector.
The first useful tool is !http. This simply sends a HTTP request to a host and print the response. The ESX API has a few pages we can use that DOES NOT require authentication. This is helpful to test a connection outside of credential issues. For example the below debug command returns “The Web Services Description Language (WSDL) file containing definition of the VMware Infrastructure Management API.”
!http https://10.73.42.10/sdk/vim.wsdl
What data is returned isn't important, what this command will tell us is can the collector connect to the ESX device or is network infrastructure somehow stopping communication.
The next command is !esx and it's a bit more powerful
help !esx
!esx: query a list of esx performance counter against the given host and print the result
usage: !esx [username=foo password=bar] <host> <entityName> <entityType[host|vm|datastore|cluster|resourcepool|hoststatus|cpu|memory|disk|network]> [counter1 [counter2...]]
If you don't give the username/password, the agent will use esx.user/esx.pass
properties of the host.
!esx is a debug tool that allows us to query the VMware API directly in the same way the datasources poll data.
To decode the help example let’s run this on the ESX server 10.73.42.10 and the virtual machine “marvin”.
The example !esx command is "!esx vc-server esx-name host cpu.usage.average mem.consumed.average"
Broken down for the test environment "!esx 10.73.42.10 marvin vm cpu.usage.average mem.consumed.average"
If you don't give the username/password, the agent will use the esx.user/esx.pass properties of the host. This is a fantastic way to test the credentials entered into LogicMonitor. You could also push the credentials by using the username= and password= options with the !esx command to verify they work with LogicMonitor.
So far we have only tested connectivity which is the most common form of ESX troubleshooting. We can also use the !esx to query individual datapoints in the datasources to ensure the data presented by LogicMonitor is accurate. The command can be built by viewing the datapoint in question. For this example we can use the Cpu Usage counter used in previous examples.
Lets take another look at the !esx usage
usage: !esx [username=foo password=bar] <host> <entityName> <entityType[host|vm|datastore|cluster|resourcepool|hoststatus|cpu|memory|disk|network]> [counter1 [counter2...]]
We know the host is the ESX server 10.73.42.17, Entity Name is the Virtual Machine Marvin, EntityType can be found in the datapoint which is "VM" and the ESX counter is cpu.usage.average.
!esx 10.73.42.10 marvin vm cpu.usage.average cpu.usage.average
which will return the value cpu.usage.average=211.0