6 years ago
collector fail count
One of our collectors is experiencing what seems to be connectivity issues. Common symptoms are it loses communication with the LM cloud, remote sessions to it or monitored devices fail to complete...
I don't know about the heartbeat fail datapoint other then what the description says "Number of failed attempts to execute the heartbeat task" but what I've setup is for all our collectors to ping LM (x.logicmonitor.com), ping 8.8.8.8 and each collector pings all the other collectors. It helps us determine if for example the internet is down vs LM SaaS itself is down vs VPN down vs internal networking issues.
Perhaps it might even make sense to temporarily add the collector server as a resource a 2nd time but have another collector monitor it. But if you have the option to just rebuild the server and collector, that might just be the simplest option.