Forum Discussion

eortiz's avatar
eortiz
Icon for Neophyte rankNeophyte
22 days ago

Bests practices for WMI failures

Hi All! we recently identified a monitoring gap: a server was responding to ping but not collecting WMI data, so it didn't issue an alert. We discovered it was in a hung state. We are considering enabling critical alerts around WMI Uptime. This should help by detecting a WMI failure on nodata or identifying if a device was rebooted during business hours without authorization. Based on your experience, is there a better approach for this? Thinking it will also work well for SNMP.

7 Replies

  • LM doesn't suggest disabling the HostStatus check is how LM notifies you that a device is down. If you disable it, you would need to rely on some other check to notify you of a device being down before the 6-minute (hard coded) suppression kicks in. WMI and SNMP checks may end up being suppressed before alerting on a down device.

    https://www.logicmonitor.com/support/logicmodules/datasources/creating-managing-datasources/host-status-host-behavior

    • eortiz's avatar
      eortiz
      Icon for Neophyte rankNeophyte

      I will only be disabling alerting, not the module or tune. I thought it would still work for suppression

      • Mike_Moniz's avatar
        Mike_Moniz
        Icon for Professor rankProfessor

        So HostStatus is just to alert/notify you that the device is down, it doesn't actually do the suppression itself (from my understanding). The 6-min suppression happens if you want it to or not. So if a HostStatus alert is disabled and the device goes down, you may not get notified that the device is down. You may not get any alerts at all without HostStatus in some device-down situations.

        Originally I did the whole NoData thing for checking for broken wmi/snmp too but ended up having dedicated checks specific for wmi/snmp as it works better, can help with troubleshooting, and also much less likely to be misinterpreted by staff.

  • Have you looked at tweaking the alerting thresholds for "Win_WMI_Access_Denied_ErrorCodes"? (It shows up as "WMI is not accessible. See error message for likely issue.")

    For SNMP, you can maybe use "Device_Component_Inventory" since its AppliesTo is only 'hasCategory("snmp").' You could simply add an alert for when there is no data.

    • eortiz's avatar
      eortiz
      Icon for Neophyte rankNeophyte

      Thanks for your suggestion orchardl​  - I’m leaning more toward leveraging uptime since it’s available for both WMI and SNMP. I’ll configure the alert on “No Data” to keep things consistent, and I’ll disable alerting on HostStatus to reduce noise. Do you see any gaps in this approach?