Forum Discussion

Egis's avatar
Egis
Icon for Neophyte rankNeophyte
30 days ago
Solved

Disks volume capacity alert thresholds based on volume size

Hi folks,

Does anyone have alerting thresholds set up based on disk volume capacity, or do you have an idea how to set it up? It could be good practice to have lower thresholds for larger data drives and higher thresholds for small drives.

For example, we need to set up warning alerts if space used is ≥98% for data drives larger than 1 TB and for every other drive, warning ≥90%.

I'm thinking about pulling the size of the disk as a property and using dynamic groups where I could set the threshold at the device group level; however, I'm not sure if this is the best approach.

Thanks!

  • It's been brought up a few times in the forums over the years. Your thought should work and might be the easiest but I personally would look to make a DataSource change to keep it a bit cleaner. I do see a Community DataSource called "WinVolumeUsageConditional-" that seems to implement something along these lines. Might be worth looking at or atleast using it for inspiration. Basically it uses Complex datapoints to provide more datapoints based on conditions that you can alert on.

    I personally don't like to hard code thresholds directly into the DataSource, so I would do something more like LargeDrivesFreeSpaceGB=if(gt(AvailableGB,1024),FreeSpaceGB,unkn()) and SmallDrivesFreeSpaceGB=if(le(AvailableGB,1024),FreeSpaceGB,unkn()) so you can set thresholds differently if the drive is small vs large. Note that I didn't test these and just off the top of my head.

    Also keep in mind that thresholds can be real numbers, so you can set thresholds to like >=98.1531%

10 Replies

  • The basic cheat of this method is the same one I use to produce percentages from arbitrarily scaled numbers:  x * n - s

    'n' is the number you have.  'x' scales it out.  's' shifts it back to the start of the range you want to end up with.

    if n is 1 - 50 but you need a graph to show it wider to line up with another metric that is 1-100:

    2 * 50 = 100 (top of the other graph)

    but since this also raise the 1 to a 2 at the bottom of the graph, -1 brings it back down to 1-99... which is pretty close to 1-100 and plays more nicely in a graph together.

    We know we want 10% at smaller sizes and 2% at larger sizes.  We just have to cheat the scale up and down to match those sizes we want based on the volume size.  I'm going to try to figure out how to drive that scale up and down more simply based on a property... that way, we can treat azure and vsphere disks differently due to the differences in how you have to resize them... and different applications that need extra freespace can just be a device level property adjustment.

    Time to play :)

  • It's been brought up a few times in the forums over the years. Your thought should work and might be the easiest but I personally would look to make a DataSource change to keep it a bit cleaner. I do see a Community DataSource called "WinVolumeUsageConditional-" that seems to implement something along these lines. Might be worth looking at or atleast using it for inspiration. Basically it uses Complex datapoints to provide more datapoints based on conditions that you can alert on.

    I personally don't like to hard code thresholds directly into the DataSource, so I would do something more like LargeDrivesFreeSpaceGB=if(gt(AvailableGB,1024),FreeSpaceGB,unkn()) and SmallDrivesFreeSpaceGB=if(le(AvailableGB,1024),FreeSpaceGB,unkn()) so you can set thresholds differently if the drive is small vs large. Note that I didn't test these and just off the top of my head.

    Also keep in mind that thresholds can be real numbers, so you can set thresholds to like >=98.1531%

    • Egis's avatar
      Egis
      Icon for Neophyte rankNeophyte

      Thanks for your help, Mike! I was able to setup thresholds based on drive sizes.

    • Egis's avatar
      Egis
      Icon for Neophyte rankNeophyte

      Hi Mike_Moniz​ ,

      This seemed to be a perfect solution... However, if alert is triggered by SmallDrivesFreeSpaceGB datapoint (for drive less than 1TB) and then the disk gets extended over 1 TB so that LargeDrivesFreeSpaceGB datapoint applies, the alert by SmallDrivesFreeSpaceGB datapoint is not clearing even though it is no longer retrieving the data.

      • Mike_Moniz's avatar
        Mike_Moniz
        Icon for Professor rankProfessor

        Hmm, while I know that No Data wouldn't reset alerts, I didn't realize that NaN would have the same issue. I wonder if inf() is still usable... I guess you can either manually reset the alert when going thru the resize or perhaps change unkn() to some obviously high number, although that might not be as clean, especially if used in dashboards/graphs.

  • This is very close:

    $sizes    = @(128,256,512,1024)
    $expand   = 200
    $contract = 1000
    
    # X=log2(Y)
    
    foreach ( $size in $sizes) {
        $newPercent = 2000 / ($expand * [math]::Log($size,2) - $contract)
        write-host $size`t$newPercent
    }

     

  • So IMHO "it depends" .. we have SQL log drives which are 20Gb and data drives at 2TB and everything in-between. In the end I spun up multiple DS's based on drive size.

    < 40GB & not boot drive, All boot drives, > 40Gb and < 200GB & not boot drive , 200Gb > 1TB , > 1TB

  • I've been giving this some thought as well and was debating writing a new DS for it... since the real concern is making sure there's enough freespace on a disk... but we're tracking that using the inverse (Space Consumed rather than Space Free) and doing it with a percentage that doesn't scale correctly.  I was going to figure out how to produce a metric threshold that operates as a sliding value that would automatically produce a threshold based on both disk size and freespace that wasn't just a static number or a dynamic value based on normal usage... more of a consistent, predictable value.

    I don't have if figured out yet.

  • Darnit... can't edit... From there, we twiddle the expand/contract values so that 1024 stays at 2 where it is, and 128 grows to 10 where we want it... then set a ceiling of 10 and use the resulting value to calculate the new threshold.

    That's the math... I just have to figure out how to apply it.

  • Final numbers... floor of 2 and ceiling of 10:

    8000 / (200 * [math]::Log($size,2) - 1000) - 6