load average: 0.00, 0.00, 0.00
How do you get this output?
To get your system’s Load Average, run the command uptime. It will show you the current time, how long your system has been powered on for, the number of users logged in, and finally the system’s load average.
What does it mean?
Simply, it is the number of blocking processes in the run queue averaged over a certain time period.
load average: 1min, 5min, 15min
What is a blocking process?
A blocking process is a process that is waiting for something to continue. Typically, a process is waiting for:
- Disk I/O
- Network I/O
What does a high load average mean?
A high load average typically means that your server is under-specified for what it is being used for, or that something has failed (like an externally mounted disk).
How do I diagnose a high load average?
Typically, a server with a high load average is unresponsive and slow — and you want to reduce the load and increase responsiveness. But how do you go about working out what is causing your high load?
Lets start with the simplest one, are we waiting for CPU? Run the Linux command top.
Check the numbers above in the red circle. They are basically representing what percentage of its’ total time the CPU is spending processing stuff. If these numbers are constantly around 99-100% then chances are the problem is related to your CPU, almost certainly that it is under powered. Consider upgrading your CPU.
The next thing to look for is if the cpu is waiting on I/O. Now check the number around where the red circle is now. If this number is high (above 80% or so) then you have problems. This means that the CPU is spending a LOT of time waiting in I/O. This could mean that you have a failing Hard Disk, Failing Network Card, or that your applications are trying to access data on either of them at a rate significantly higher than the throughput that they are designed for.
To find out what applications are causing the load, run the command ps faux. This will list every process running on your system, and the state it is in.
You want to look in the STAT column. The common flags that you should be looking for are:
- R – Running
- S – Sleeping
- D – Waiting for something
So, look for any processes with a STAT of D, and you can go from there to diagnose the problem.
To diagnose further, you can use the following programs
- strace – to trace what a program is doing
- iostat – to see the throughput of your disks
- bwmon – to see your network throughput