Help:Ganglia

From CECS wiki
Jump to navigation Jump to search

Ganglia monitors and logs performance statistics for clusters. Ganglia can be used for general health and utilzation monitoring of large collections of systems. If you would like to add your cluster to the CECS grid please let me know.

Software homepage
http://ganglia.info/
Other related software
rrd, apache, php
Example site
UC Berkeley Grid
View online documentation
Other links
SourceForge: Ganglia project

On campus grids[edit]

Ganglia sites

Usage notes[edit]

Each grid, subgrid, cluster, and node has its own page with a collection of graphs. You can dig down to the level of detail you need by clicking on successive summary graphs until you get what you need.

At the cluster level, you can tell how loaded machines are at a glance by looking at the color codes of the graphs at the bottom of a cluster summary page. The key for the color codes is next to the pie chart in the top section of the page. Note that red nodes are fully loaded or overloaded, and light blue nodes are idle. Please check ganglia before selecting which nodes to schedule your job on.

If you are looking for a summary graph to stick on your sidebar, dashboard, or home page, you can find your favorite summary graph and right click on it. If you would like a more customized graph (like these) let me know what you are looking for and on which cluster.

Reading graphs to debug critical system performance[edit]

Ganglia is an excellent tool to use to watch the aggregate system resource history of jobs on nodes. Ganglia does not track individual jobs, but it does track entire nodes. So if your job is the only job on a node, you can directly watch your job in ganglia.

The most common problems with jobs that can be spotted in ganglia are poor cpu use and memory overallocation.

cpu use[edit]

Poor cpu use will show as cpus that are idle even when jobs are supposed to be running. Some parallel jobs have very dynamic loads (some threads may finish sooner than others), so this may be unavoidable, depending on your algorithms. Ganglia can also show if your job is not being distributed correctly, for instance, placing all the threads on one node, overloading it, and leaving other nodes idle.

memory leak[edit]

Memleak.png

A memory leak occurs when a program uses dynamic memory allocation in a loop, but fails to free memory after it is no longer used. A memory leak can occur over any length time period, and typically shows up in the ganglia memory report graph as a constant slope ramp. To look for a memory leak with ganglia, select the memory report and check the per node graphs, changing the time period to increasing longer intervals. An unchecked memory leak will eventually run the system out of virtual memory and crash the job and possibly the compute node. Adding virtual memory to the node will not help a memory leak, but may increase the time before a node crashes which may help debug the job or at least kill it.

memory over commit[edit]


Ganglia mem.png

Overallocation of memory, in the worst case will cause the nodes to crash and your job to be aborted. In the less worse case, it would cause thrashing, and your job's performance may be severely impacted, possibly making it run at one tenth of the speed it should, or worse.

This situation is a little more complicated to spot in ganglia. The best way is to examine the memory report and cpu report graphs in ganglia. Use the metric pull down menu at the top to change between them.

First, examine the memory report graphs: Read the colors as follows:

  • White below the bottom red line is unused memory
  • Green is free memory
  • purple is swapped out memory

A node is OK as long as there is white below the bottom red line or any green.

The top red line (which has been added to ganglia on some clusters) is the maximum virtual memory in the system. When the purple meets the top red line, the node crashes.


Ganglia cpu.png

Then look at the cpu report graphs.

There should be little or no red in the graphs. Red indicates high overhead system activity, such as disk or network I/O. Lots of red is heavy I/O. Occasional spikes of red is typical of jobs saving data just before completion, or while checkpointing, which is ok. Large amounts of solid continuous red, especially when there is more red than blue or yellow, combined with the shortage shown in a memory graph, is a sign that the node is totally out of memory and thrashing, severely hurting job performance.

If your systems are crashing due to persistent memory overcommit, ask to have cgroups set up to limit jobs so the job crashes instead of the system. This helps with fast recovery, which may also help accelerate debugging of the job.

Installation notes[edit]

  • use gmetric build scripts to collect additional statistics
  • Each grid must have at least one gmetad which logs statistics
  • Each cluster must have at least one "master" gmond designated to collect statistics from that cluster.
  • Each host must have a gmond to collect statistics and report the back to each master gmond for the local cluster.
  • The PHP ganglia visualizer must be able to talk to a gmetad, and access the directory where gmetad collects its rrd files.

Here are a few of the config files and critical parameters that must be correct for proper operation.

gmetad.conf[edit]

gridname
authority
lists website for this host if not in /ganglia/
data_source
(multiple) lists other clusters or grids to include in this one
trusted hosts
list other metad's in this grid/cluster (parent, child, etc)

gmond.conf[edit]

cluster name
udp_send_channel
master gmond to report to. List exactly one gmond per section. Multiple sections possible. All nodes in a cluster should report to a single master gmond.
upd_recv_channel
if using ACL's, make sure all subservient gmond's (and parent gmetad?) have access
tcp_accept_channel
ACL's must allow parent gmetad
collection_group
lists statistics this gmond will collect and report

Troubleshooting configuration[edit]

  1. On each master gmetad, make sure UDP ACL includes all clients
  2. On each master gmetad, make sure all uplink masters are trusted
  3. On each master gmond, make sure all clients are in the UDP allow ACL
  4. On each master gmond, make sure all uplink servers are in the TCP allow ACL
  5. On each client, make sure master gmond is listed as a send channel
  6. After making these changes, restart respective daemons ( /etc/init.d/{gmond|gmetad} restart)

Some variations will exist with newer versions of ganglia with better defaults. Also, multicast simplifies the client / master gmond stuff.