Help:MPI

From CECS wiki
Jump to navigation Jump to search

MPI stands for Message Passing Interface.

See also a simple qsub and MPI example for a C programming example.

MPI not just software, but an application programming interface, protocol, and library. MPI has been implemented for many operating systems and parallel architectures by many different vendors. It is likely that more than one version is available on any given cluster.

For use examples on the clusters, please see Qsub and MPI example.

Software availability[edit]

NOTE: You must choose a version of MPI compatible with the compiler you are using, and you must use the correct version of mpirun to match.

Roman cluster
mpich version 1 or 2 is on most cluster nodes
lam mpi is on all upgraded nodes
MMAE student cluster
Sun MPI was at one time on the sun machines, but is currently not set up. Ask if you wish to use it.
i2 / Euler cluster
mpich version 1 is installed, but does not seem to work between nodes
lam mpi is installed and compiled to use the intel fortran compiler; add /opt/lam/bin to the beginning of your path to use it; NOTE: you must add lam to the START of your path, and you must do it in your .bashrc near the top, before the conditional for interactive shells
i2 / deli cluster (Hilbert)
lam mpi, mpich, lam mpi for Intel compilers

MPI web links[edit]

Cluster use[edit]

Please be considerate of other users. Check the system load with the uptime command or look at ganglia on the head node's web server and check the load on the machines you are using. Ask if you are having trouble finding this.


using LAM mpi[edit]

The cluster machines running newer versions of RedHat, Fedora Core, and CentOS have LAM mpi installed, and should use the same commands as above with some additions.

Some LAM mpi commands are:

hcc hcp hf77 laminfo lamnodes lamshrink lamtrace mpiCC mpic++ mpicc mpiexec mpif77 mpimsg mpirun mpitask tkill tping

To start LAM mpi, type:

lamboot hostfile

Where hostfile lists the hosts to use. You can either list hosts multiple times, or include cpu=2 after each hostname to use more cpus. If ssh asks for a password AND rsh works on the cluster, you can instead try

lamboot hostfile

or

LAMRSH=rsh lamboot hostfile

Once lamboot has successfully run, to run a.out, try any of these:

mpirun n0-3 a.out
run on 4 nodes, 0 to 3,
mpirun -np 4 a.out
run on first 4 nodes
mpirun -np 8 n1-4 a.out
run on 4 nodes (1-4) but use 8 processors

To check which nodes mpi is correctly running on, type

 lamnodes

To clean up things you might have left running without shutting down your lam subcluster

 lamclean

To stop LAM mpi, type

 lamhalt

SPECIAL NOTE: The first node in the hosts file will be the master node, not the one you start from!

Lam documentation is available at http://www.lam-mpi.org/ Some reference documentation is in the following man pages:

lam
overview and introduction to LAM
introu
lam user commands