Linux Load Average explained
load average: 0.00, 0.00, 0.00
How do you get this output?
To get your system’s Load Average, run the command uptime. It will show you the current time, how long your system has been powered on for, the number of users logged in, and finally the system’s load average.
What does it mean?
Simply, it is the number of blocking processes in the run queue averaged over a certain time period.
Time periods:
load average: 1min, 5min, 15min
What is a blocking process?
A blocking process is a process that is waiting for something to continue. Typically, a process is waiting for:
- CPU
- Disk I/O
- Network I/O
What does a high load average mean?
A high load average typically means that your server is under-specified for what it is being used for, or that something has failed (like an externally mounted disk).
How do I diagnose a high load average?
Typically, a server with a high load average is unresponsive and slow — and you want to reduce the load and increase responsiveness. But how do you go about working out what is causing your high load?
Lets start with the simplest one, are we waiting for CPU? Run the Linux command top.
Check the numbers above in the red circle. They are basically representing what percentage of its’ total time the CPU is spending processing stuff. If these numbers are constantly around 99-100% then chances are the problem is related to your CPU, almost certainly that it is under powered. Consider upgrading your CPU.
The next thing to look for is if the cpu is waiting on I/O. Now check the number around where the red circle is now. If this number is high (above 80% or so) then you have problems. This means that the CPU is spending a LOT of time waiting in I/O. This could mean that you have a failing Hard Disk, Failing Network Card, or that your applications are trying to access data on either of them at a rate significantly higher than the throughput that they are designed for.
To find out what applications are causing the load, run the command ps faux. This will list every process running on your system, and the state it is in.
You want to look in the STAT column. The common flags that you should be looking for are:
- R - Running
- S - Sleeping
- D - Waiting for something
So, look for any processes with a STAT of D, and you can go from there to diagnose the problem.
Further Diagnosis
To diagnose further, you can use the following programs
- strace - to trace what a program is doing
- iostat - to see the throughput of your disks
- bwmon - to see your network throughput
November 28th, 2007 at 5:49 am
Nice little article…. really useful. Can you please do a similar article on strace iostat and bwmon too.
Thanks,
May 3rd, 2008 at 2:29 pm
Excellent Article! Will use some information on my blog too. Waiting for Further Diagnosis Article :)
May 9th, 2008 at 2:22 pm
Thanks a lot for great article with step by step instructions. Very ease and very useful. Really will be great if you will continue about strace, iostats and similar utilities. Thank you again.
June 13th, 2008 at 11:21 am
Nice article and very informative for the basics of Linux.
Thx.
August 5th, 2008 at 7:47 am
Identified that claamscan is running curently for every host and need to change the scheduling of each host for clamscan for a single instance at a time.
The more hosts, and more concurrent instances of clamscan, the greater the load average and slower the response time.
Thanks
PHK Corporation