convert alert status to unknown
I have a suggestion to fix an architectural problem within LM related to missing data. Right now, knowing data is missing is very difficult, nearly intractable. In some cases, with enough digging, you can find a DS author chose to set a canary datapoint to warn on no data that has no meaning otherwise, so it can be used in alert rules safely. In most cases this is not true (or if it is, again, indiscernible without code review).
At the same time, LM persists status throughout the lifetime of no data. For example, if a disk hits 99% full then that disk is removed, the datapoint will be in some non-ok status indefinitely, until the instance is purged. My suggestion is that after some time passes (configurable at one or more levels), a datapoint with no data changes its status to unknown (or whatever label you prefer that means that). Then we can write alert rules to care about unknown distinctly, knowing it can be detected on any desired datapoint. If a problem was detected before transition to unknown, it is no longer considered a problem. If the instance starts producing data again and the problem actually persists, then it will re-arm with the correct status. The key problem I see here is that the desired time interval to give up and convert to unknown is probably longer than I might want to allow a no data condition to persist undetected. Given that, it may be better to make unknown a first order condition (parallel to status). I still think status should eventually reset when a long enough time passes without data, just not sure it should be the same timeframe.