Ping Failing from collector to device and back?

Neophyte

3 years ago

This behavior seems like it may be related to, or overlap with, an issue I first observed in January 2020 (I think) and was never able to resolve.

Randomly, subsets of our Juniper switches (and only switches, no other devices) would trip alerts indicating 100% ping loss. It would usually auto-resolve after 60-90 minutes and never left evidence behind--that I could find--why the condition started or cleared up.

During the time the alerts were in effect, I had other non-collector sources of pings to the same switches that were not disrupted and I could ping back to the collectors involved from the switch command line. SSH, SNMP, other communication between collectors and switches showed no problem.

Of note, none of the traffic between collectors and switches traversed a firewall.

The real kicker to me was that I had never seen this behavior until I upgraded collectors to 29.003. If I rolled collectors back to 28.x, the issue did not occur. As soon as I pushed forward again to 29.x it started happening again. I opened a case with support and I spent a lot of tedious time trying to figure out where traffic was getting dropped to no avail; after several months I was not able to convince them to move from their “its something in your environment” stance. As much as I wanted an answer I simply could not afford to devote the time needed to sustain an investigation.

Ultimately I applied system.category “NoPing” to switches and moved on.

Forum Discussion

Ping Failing from collector to device and back?

Recent Discussions

Are Auto-Balance groups supposed to balance automatically?

Any way to have one Escalation Chain send emails to different people for different resources?

Microsoft O365 Sharepoint site monitoring behind SSO

Do you use SSO and require group membership? And does it also assign permissions?

New UI - Changes to Info tab on Resources page