3 years ago

Availability alarm correlation

When any server is down, I can see alarms reported by Ping and Host Status LogicModules, but I can also see other alarms reporting some missing data.

Is there any method to automatically acknowledge (or tag) those missing data alarms reported when a server is simply down?

It looks like a "Host Status" alarm should silence all other alarms from that server. Should this make sense?


    Yes, the Host Status alert should cause the device to be marked as "Dead" which should suppress all other alerts on that device. However, there's often a race condition if other thresholds are set. The Host Status alert threshold is >300 on the idleInterval datapoint. It's definition is:


    The interval in seconds we do not get data from the host. NOTE: there is server side logic that declares a host DOWN after 6 minutes, suppressing other alerts. We do not recommend you change this alert.    

    So, the alert opens at 5 minutes, then the logic kicks in to mark the device as down at 6 minutes. If any alerts open before that, they are not closed, and there was nothing stopping them from opening. 

    I know there have been a few requests allowing customers to modify the built-in server-side logic that marks a device as dead. I recommend you reach out to your CSM to add your support to that request.