Forum Discussion
Hey Stuart
Thanks. Yes, I know. When the issue is in occurrence via the Ping datasource and I open a terminal prompt on the relevant collector, manually instantiating a ping and a traceroute both fail. That is a pretty clear indicator of where the problem lies.
But it is also a way to try to eliminate problem domains by looking at what variable has changed....and in this situation the compute and storage HW didn't change, the hypervisor didn't change, the host OS didn't change, the network connectivity (physical and logical) didn't change. The only thing in the stack that changed is the collector version, and when I undo that change by rolling the collector version back the issue goes away.
Seems like yes the issue is in our environment, but the triggering event is a stack interaction that occurs differently---or not at all-- before 29.001.
Hugely frustrating and moving forward is stepping into a giant time-sucking rabbit hole trying to isolate variables to ID that interaction. On the other hand, I have considered a second sledgehammer of simply eliminating use of the Ping datasource but everybody who may be reading could probably come up with several reasons why that might not be a good approach. So hopefully someone will post an idea to help focus the search really narrowly that is achievable at a relatively low cost. I'm giving it a few days but that sledgehammer is leaning against the wall right over there.....
Related Content
- 8 months ago
- 5 months ago