Forum Discussion

Kelemvor's avatar
Kelemvor
Icon for Expert rankExpert
2 months ago

Can someone explain the CPU 5 minute load average thing to me?

So,

We have a server currently alerting us because the 5MinLoadPerCore field is > 1.  I'm trying to understand why that is.

I found this page that says if that number is >1, it means there are things queued up waiting for the CPU and there's a backlog.

https://www.logicmonitor.com/blog/what-the-heck-is-cpu-load-on-a-linux-machine-and-why-do-i-care

However, the server in question has the CPUs currently running at around 60%.  I would think that if it were backed up, it should be cranking at 99% trying to catch up.  I would think the 5minload alert and a CPU Usage percentage alert would come as a pair, but they don't.

Just trying to figure out if there's anything that can be done when we get the 5Min alerts or if they're more just informational and can be ignored.  They only come in as a Warning anyway, so if it's just informational, then it's just noise, and maybe we'll just turn them off.

Just looking for other opinions. ;)

Thanks.

  • The Linux load average represents the average number of processes either running on the CPU or waiting for resources (CPU or I/O). These averages are given as three numbers: 1-minute, 5-minute, and 15-minute averages.

    A general rule of thumb is that your load average should not exceed the total number of CPU cores in your system. For example, if your server has 8 cores, a load average of 8 means all cores are fully utilized, and anything higher indicates tasks are queuing.

    Short bursts above this limit, reflected in the 1-minute average, can be normal depending on workload. However, sustained overloads in the 5- or 15-minute averages may point to performance bottlenecks.

    To find the number of CPU cores you can use:

    cat /proc/cpuinfo | grep processor | wc -l

    Or you can use something like top or htop.

    Keep in mind that high load averages can also result from I/O bottlenecks, not just CPU saturation.

  • I would fire up `iotop` and see what processes are using lots of IO when the load average is high.