
From CECS wiki
Jump to navigation Jump to search

Caffe is one of several deep learning frameworks made with expression, speed, and modularity in mind.

Software homepage
Software availability
available on multiple clusters for cpu and gpu use
Other related software
cuda, cudnn
command to type to run
module load caffe ; caffe.bin

Using caffe[edit]

sample SGE script:

#$ -cwd
#$ -l gpu=1
module load cuda cudnn opencv caffe-deps
caffe.bin train --solver=solver.prototxt

Sample sge script with checkpoint and restart support: (UNTESTED -- please tell us if this works!)

#$ -cwd
#$ -l gpu=1
#$ -ckpt caffe_ckpt -c 36000
module load cuda cudnn opencv caffe-deps
caffe.bin train --solver=solver.prototxt

It may be necessary to add code to the script to tell caffe to use the checkpoint.

Meaning of checkpoint options:

-r y
job is restartable
-ckpt lsdyna_ckpt
use lsdyna method to trigger checkpoint and migration
-c 36000
checkpoint every 10 hours
your script can check for this environment variable to see if the job was restarted automatically

Compiling caffe on rocks[edit]

Caffe is already compiled on the cluster as a module. However, if you want to modify caffe and compile your modified version, these directions may help. These directions apply to all rocks clusters here. If your cluster is missing the caffe-deps module, please ask for it to be installed.

All compilation must be done on the head node.

To compile caffe on the local systems, this is the recommended configuration:

  1. module load cuda cudnn caffe-deps opencv opt-python
  2. cp Makefile.config.example Makefile.config
  3. Edit the following values in Makefile.config (change value or uncomment as appropriate):
BLAS := open
PYTHON_INCLUDE := /opt/python/include/python2.7 \
               /opt/python/lib/python2.7/dist-packages/numpy/core/include /opt/python/include/python2.7
PYTHON_LIB := /opt/python/lib
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib64/atlas /usr/lib64 /usr/lib /share/apps/caffe-deps/lib

Change CUDA= to the path shown with module show cuda for the version of cuda you are using.

If you want to use python layers, add


You may also need to change the following to add cudnn, caffe-deps


Then use make to build caffe.

Use make distribute to install caffe in the distribute directory.