Forum Discussion

Javier's avatar
Javier
Icon for Neophyte rankNeophyte
4 years ago

Availability alarm correlation

When any server is down, I can see alarms reported by Ping and Host Status LogicModules, but I can also see other alarms reporting some missing data.

Is there any method to automatically acknowledge (or tag) those missing data alarms reported when a server is simply down?

It looks like a "Host Status" alarm should silence all other alarms from that server. Should this make sense?

 

  • Anonymous's avatar
    Anonymous

    Yes, the Host Status alert should cause the device to be marked as "Dead" which should suppress all other alerts on that device. However, there's often a race condition if other thresholds are set. The Host Status alert threshold is >300 on the idleInterval datapoint. It's definition is:

    Quote

    The interval in seconds we do not get data from the host. NOTE: there is server side logic that declares a host DOWN after 6 minutes, suppressing other alerts. We do not recommend you change this alert.    

    So, the alert opens at 5 minutes, then the logic kicks in to mark the device as down at 6 minutes. If any alerts open before that, they are not closed, and there was nothing stopping them from opening. 

    I know there have been a few requests allowing customers to modify the built-in server-side logic that marks a device as dead. I recommend you reach out to your CSM to add your support to that request.