Forum Discussion

mnagel's avatar
mnagel
Icon for Professor rankProfessor
6 years ago

ACK/SDT improvements needed

Right now, ACK and SDT work, but miss important functionality.  Please consider addressing all of these:

* ACK should be able to expire (critical issues that should not be lost forever, or to set a maximum expected recovery time period -- not possible with SDT).
* ACK should be able to clear if a worse condition occurs (in Nagios, this is a non-sticky ACK)
* ACK and SDT notices should be shipped to custom email integrations (this one is a bug as far as I am concerned)

 

    • ACK should be removable if determined it was checked incorrectly by a user.
  • Thinking about this, I would find useful if an acknowledgeded alert that is place into SDT could also become automatically unacked at the end of the SDT.  (I'd also like the ability to limit SDTs to no later than n days from timenow.)

  • 9 minutes ago, Cole McDonald said:
    • ACK should be removable if determined it was checked incorrectly by a user.

    Oh it is, but it is definitely a non-obvious side-effect of disabling alerts and re-enabling.  I frequently get the feeling different aspects of LM were written by summer interns :).

  • My issue includes how LM works w/ third part integrations.    When an object is put in SDT then Alerts should be closed, and a closing event should be sent to the integration.

    1. There are two different camps of thought on this:
    2. 1. The alert should stay open, and if/when the alert is resolved, then the alert is cleared.  The issue with this is what if the system is put into SDT and then removed. There is no additional monitoring done to close the alert.  Thus the alert is orphaned.
    3. 2. The alert is closed.   If the alert is closed, a closing event is set to the integration.  When the element is brought out of SDT, monitoring resumes and if there is still an issue an alert will be generated. 

    #2 is the more typical process.  In that case if the alert was ACKed then there is no issue, because the original/ACKed alert was closed, and a New alert generated.

    I also agree that ACKs should be cleared if the Alert changes to higher severity and ACKs should be removable.

     

     

  • Yes, and LM actually agreed with me and others (eventually) and fixed this in v133.  And then they broke it sometime after that, no ETR that I am aware of.