38 minutes ago, Garry Gearhart said:
We'd like to have better visibility to data quality and polling - for example if 10 polls are expected per hour for a particular datapoint with an effective threshold and only 8 are received then that is of interest. That being more of a "did we get a poll" cross check. Checking for quantity and no data would also be valuable independent of using the no data alerting functionality. Has anyone else looked into this and come up with a solution?
This has been something I have pushed LM on for years with not much interest (certainly no indication it will change). Two recommendations I've made that would solve this and other problems:
* change no data to a first-class condition (like warning, error and critical) so that alerts can be applied to them more explicitly than now, where you have to dissect modules to find out if a datapoint is set to alert on "no data". at the very least, the alert generation status of 'no data' in a DS should be exposed and changeable in alert tuning.
* add a facility to check for conditions over time (would address your case as well as "high CPU over the last hour" and similar, which is a huge missing feature in the current framework).
I have a resource check script that I started to extend with recent data history to be able to identify these issues via the API, but it is not quite there. That is really the only option I am aware of, but I have hesitated since it will impart a high load onto the API. I do have a weekly check that looks for lack of Netflow data for Netflow-enabled devices. It is only one specific case, though, needed since the builtin Netflow heartbeat check is not able to detect lack of ingested flow data, it only looks for received datagrams independent of content or device binding.