Azure Autobalanced Collectors - 5 minute windows of ping failures

Question

Here's a weird one.&nbsp; We have a customer with three auto-balanced collectors in Azure.&nbsp; They see the following pattern for ping loss (0 is good):

12:34: 0
12:35: 0
12:36: 100
12:37: 100
12:38: 100
12:39: 100
12:40: 100
12:41: 0
12:42: 0
12:43: 0

So regularly (6 or 7 times a day for ALL Resources), there are 5 minute windows where the ping loss is 100%.&nbsp; PRTG (admittedly on a different Azure subnet) is showing no issues whatsoever.
So... what's to blame?

joe_williams · Answer

Is it just Ping? Or other datasources as well?Have you identified if its a random collector or always the same collector?Is there a pattern with the times?&nbsp;

david_bond · Answer

Just Ping.&nbsp; There is seemingly no pattern.&nbsp; The weird thing is that it's always in 5 minute blocks, but the Ping DataSource polls 10x ICMP send/response every minute, so these are independent measurements.&nbsp; I cannot believe that it's LogicMonitor's fault UNLESS it's related to the auto-balancing, but that doesn't seem right either.I wondered if it was something that anyone else had seen as being a problem in Azure networking environments, perhaps an oddity of routing?&nbsp; I'm at a complete loss.

dave_lee · Answer

Not specific to Azure, but we've seen issue occasionally where a collector fails to interpret the results of the PING check.&nbsp; Jumping onto the controller itself and running a ping works fine (as in, using the OS ping utility) but the collector software doesn't seem to be able to do it.&nbsp; In situation is different though, the collector reservices must to be restarted to fix it.

david_bond · Answer

We've moved PRTG to the same subnet and again, PRTG does not suffer from this issue. It seems to be isolated to LogicMonitor.

lmpatricka · Answer

When you look at the Collector device that is monitoring the devices where the PING loss is happening, I would recommend checking the Collector Data Collecting Tasks datasource to see if there is some sort of cyclical overload of those tasks happening. Specifically I would look at the graphs for Unavailable Thread Scheduling and Queue datapoints before, during and after the Ping loss time periods. It may indicate that the collector ABCG is not keeping up with Ping specifically, which is something you may be able to tune around.&nbsp;

Forum Discussion

Azure Autobalanced Collectors - 5 minute windows of ping failures

9 Replies

Recent Discussions

Why Do You Export Data from LogicMonitor? We’d Love Your Input

No Link To Start Chat

Windows logs in LMlogs

Are Auto-Balance groups supposed to balance automatically?

🚀 Introducing the ✨AI Assistant for LogicMonitor: Your New Support/Troubleshooting Companion!