Load (computing)

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In UNIX computing, the system load is a measure of the amount of work that a computer system is doing. The load average is the average system load over a period of time. It is conventionally given as three numbers that represent the system load during the last one, five, and fifteen minute periods.

Contents

[edit] Unix-style load calculation

All Unix and Unix-like systems generate a metric of three "load average" numbers in the kernel. These can be most easily queried from the Unix shell by running the uptime command:

$ uptime
09:53:15  up 119 days, 19:08,  10 users,  load average: 3.73 7.98 0.50

The w and top commands show the same three load average numbers, as do a range of graphical user interface utilities.

An idle computer has a load number of 0 and each process that is using CPU or waiting for CPU adds to the load number by 1. Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes are blocked in I/O due to a busy or stalled I/O system. This, for example, includes processes that are blocked due to an NFS server failure or slow media (e.g., USB 1.x storage devices), leading to an elevated load average, which does not reflect an actual increase in CPU use (but still gives an idea on how long you have to wait).

The load average is calculated as the exponentially damped/weighted moving average of the load number. The three values of load average refer to the past one, five, and fifteen minutes of system operation.

For single-CPU systems that are CPU-bound, one can think of load average as a percentage of system utilization during the respective time period. For systems with multiple CPUs, the number needs to be divided by the number of processors in order to get a percentage.

For example, a load average of "1.73 0.50 7.98" on a single-CPU system can be interpreted as:

  • during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes were waiting for a turn)
  • during the last 5 minutes, the CPU was underloaded 50% (no processes were waiting for a turn)
  • during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes were waiting for a turn)

This means that this CPU could have handled all of the work scheduled for the last minute if it were 1.73 times as fast, or if there were two (1.73 rounded up) times as many CPUs, but that over the last five minutes it was twice as fast as necessary to prevent runnable processes from waiting their turn.

Conversely, in a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU.

On modern UNIX systems, the treatment of threading with respect to load averages varies. Some systems treat threads as processes for the purposes of load average calculation: each thread waiting to run will add 1 to the load. However, other systems, especially systems implementing so-called M:N threading, use different strategies, such as counting the process exactly once for the purpose of load (regardless of the number of threads), or counting only threads currently exposed by the user-thread scheduler to the kernel, which may depend on the level of concurrency set on the process.

On many systems, the load average is generated by sampling the state of the scheduler periodically, rather than recalculating on all pertinent scheduler events. This is done for performance reasons, as scheduler events occur frequently, and scheduler efficiency is very important for system efficiency. As a result, sampling error can lead to the load average inaccurately representing actual system behavior. This can be a particular problem for programs that wake up at a fixed interval that aligns with the load average sampling, in which case the process may be under- or over-represented in the load average numbers.

[edit] CPU Load vs CPU Utilization

A comparative study of different load indices has been done by Domenico et. al. [1] and it was reported that CPU load information based upon the CPU queue length does much better in load balancing compared to CPU utilization. The reason CPU queue length did better is probably because, when a host is heavily loaded, its CPU utilization is likely to be close to 100% and it is unable to reflect the exact load level of the utilization. In contrast, CPU queue lengths can directly reflect the amount of load on a CPU. As an example, both resources with different average queue length, one with 3 and the other with 6 probably have utilizations close to 100% while they are obviously different.

[edit] See also

Other commands for assessing system performance:

  • uptime for load average
  • top for an overall system view
  • iostat for storage I/O statistics
  • netstat for network statistics
  • mpstat for CPU statistics
  • tload load average graph for terminal
  • xload load average graph for X

[edit] External links

[edit] References

[1] Domenico Ferrari and Songnian Zhou, “An Empirical Investigation of Load Indices For Load Balancing Applications” Proc. Performance ’87, the 12th Int’l Symp. On Computer Performance Modeling, Measurement, and Evaluation, North Holland Publishers, Amsterdam. The Nertherlands 1988. pp. 515-528

Personal tools