Forum Discussion

Vitor_Santos's avatar
4 years ago

Ability to set # of consecutive polls for a certain DS (on a group/client basis)

Hello,

This request might not be that easy to achieve, but, it would be very very helpful if we could select/tweak the # consecutive polls to alarm for a certain data point (according to each client necessity) without having to create a NEW DS for each scenario.
Not sure if someone brought that in the past already, this a feature we're losing when moving from our old platform & would be extremely appreciated.

Essentially we just want to be able to have the WinCPU (for example) with its GLOBAL definitions. But, if Client A requests to alarm on CPU after 3 consecutive polls (different from the GLOBAL) & Client B wants it 10 consecutive polls, we need to be able to achieve it without having to create a DS for each requirement.

Thank you!

Regards,

4 Replies

  • I 100% agree this is needed -- we have to hack around this all the time with escalation chains that have one or more empty stages, and still that does not prevent alerts from registering in the system.  But this is just one case that would be trivial to solve with DS inheritance, something I have been pushing for well over four years now. The issue with creating new DSes is they are then freestanding clones, meaning each must now be maintained independently (and this is commonly pushed by support as a solution, sadly).  If we could just get inheritance done (not just for DSes, but that would be the highest impact) it would be easy to make a copy that does what you want with changes only to parameters you desire while still getting the benefit of updates on the parent module and minimal maintenance requirements.  It would be important that child module applies-to expressions are automatically excluded from the parent chain, too.

    A related change for alerts that would not be solved by inheritance but I had also benefited from in our previous tool is threshold calculation over time.  For example, I don't care if CPU is high on a Windows server for a few minutes, but I do care if it is high for an hour. I also need to know if the average is high over a period of time when the actual level may be oscillating during that period and LM would not generate any alerts otherwise).  With Nagios we did this by calling back to the pnp4nagios RRD data to calculate averages, slopes, etc.  This could be done in LM if using the API from within modules was supported properly, but I refuse to go there until there is library support within the module system.

  • 9 minutes ago, mnagel said:

    I 100% agree this is needed -- we have to hack around this all the time with escalation chains that have one or more empty stages, and still that does not prevent alerts from registering in the system.  But this is just one case that would be trivial to solve with DS inheritance, something I have been pushing for well over four years now. The issue with creating new DSes is they are then freestanding clones, meaning each must now be maintained independently (and this is commonly pushed by support as a solution, sadly).  If we could just get inheritance done (not just for DSes, but that would be the highest impact) it would be easy to make a copy that does what you want with changes only to parameters you desire while still getting the benefit of updates on the parent module and minimal maintenance requirements.  It would be important that child module applies-to expressions are automatically excluded from the parent chain, too.

    A related change for alerts that would not be solved by inheritance but I had also benefited from in our previous tool is threshold calculation over time.  For example, I don't care if CPU is high on a Windows server for a few minutes, but I do care if it is high for an hour. I also need to know if the average is high over a period of time when the actual level may be oscillating during that period and LM would not generate any alerts otherwise).  With Nagios we did this by calling back to the pnp4nagios RRD data to calculate averages, slopes, etc.  This could be done in LM if using the API from within modules was supported properly, but I refuse to go there until there is library support within the module system.


    Yes, creating new DS isn't an optimal/acceptable solution exactly due to that. We need to maintain each DS & if there's a new one released we would need to edit all the other ones to keep everything according...
    There should just be a way of tweaking this & still maintaining the parent DS.

  • 2 hours ago, mnagel said:

    A related change for alerts that would not be solved by inheritance but I had also benefited from in our previous tool is threshold calculation over time. 

     

    this is also how one of our old monitoring tools, CA Performance Management (and CA eHealth before that) used to do - calculate over time instead of consecutive polls. that way occassional spikes are ignored but picked up issues if spikes are frequent.

    e.g. if utilization value is above threshold for 30min out of 1 hour window (fixed or sliding etc.), then generate alarm.

    Dynamic thresholds in LM can somewhat cover for this, i think, but still not as flexible as defining our own 'threshold violation duration' in a 'specified window'

  • Anonymous's avatar
    Anonymous

    With the exchange, there's a possibility of maintaining multiple children DSs for a single parent DS in the repo. With that, you'd be able to clone an existing repo module, modify it and both the original and the clone would subscribe to the in repo version for updates. It's a possibility now that we have exchange rolled out. not sure if/when.