Forum Discussion

Garry_Gearhart's avatar
3 years ago

Consistent data polls and no data

We'd like to have better visibility to data quality and polling - for example if 10 polls are expected per hour for a particular datapoint with an effective threshold and only 8 are received then that is of interest. That being more of a "did we get a poll" cross check. Checking for quantity and no data would also be valuable independent of using the no data alerting functionality. Has anyone else looked into this and come up with a solution?

  • 38 minutes ago, Garry Gearhart said:

    We'd like to have better visibility to data quality and polling - for example if 10 polls are expected per hour for a particular datapoint with an effective threshold and only 8 are received then that is of interest. That being more of a "did we get a poll" cross check. Checking for quantity and no data would also be valuable independent of using the no data alerting functionality. Has anyone else looked into this and come up with a solution?

    This has been something I have pushed LM on for years with not much interest (certainly no indication it will change).  Two recommendations I've made that would solve this and other problems:

    * change no data to a first-class condition (like warning, error and critical) so that alerts can be applied to them more explicitly than now, where you have to dissect modules to find out if a datapoint is set to alert on "no data".  at the very least, the alert generation status of 'no data' in a DS should be exposed and changeable in alert tuning.

    * add a facility to check for conditions over time (would address your case as well as "high CPU over the last hour" and similar, which is a huge missing feature in the current framework).

    I have a resource check script that I started to extend with recent data history to be able to identify these issues via the API, but it is not quite there. That is really the only option I am aware of, but I have hesitated since it will impart a high load onto the API.  I do have a weekly check that looks for lack of Netflow data for Netflow-enabled devices.  It is only one specific case, though, needed since the builtin Netflow heartbeat check is not able to detect lack of ingested flow data, it only looks for received datagrams independent of content or device binding.

  • Anonymous's avatar
    Anonymous
    20 minutes ago, mnagel said:

    * add a facility to check for conditions over time (would address your case as well as "high CPU over the last hour" and similar, which is a huge missing feature in the current framework).

    Like moving the trigger/clear interval into the alert rules instead of on the datapoint? Or do you mean just adding trigger/clear intervals to the no data first-class condition?

  • Stuart - Either approach would be better than the discovery approach I have now.  I discovery it when I look at a graph of Network Bandwidth and find the graph is blank.  When I have the mouse over the graph - there are random dots that show up. When I look at the Raw Data link - I find an interval is there with data - and then 2 or more intervals have no data - then another interval will have data -followed again by lots of intervals with no data.

    Researching why we have no data has been troublesome while we are still growing our coverage.