Help:Slurm

The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource Management or SLURM), or Slurm, is a free and open-source job scheduler for Linux

Software homepage: https://slurm.schedmd.com
Software availability: newer clusters
Other related software: replaces sge
command to type to run: sinfo sbatch srun scancel squeue
View online documentation: https://slurm.schedmd.com/quickstart.html; https://slurm.schedmd.com/tutorials.html; https://arcc.ist.ucf.edu/index.php/help/tutorials/job-submission-on-stokes-with-slurm; https://srcc.stanford.edu/sge-slurm-conversion
SLURM Command Option Summary (cheat sheet): https://slurm.schedmd.com/pdfs/summary.pdf
Location of example files

Examples[edit]

batch example[edit]

Sample Job script:

#!/bin/bash
sleep 100

Make the script executable:

chmod +x testscript

Submit the script

sbatch testscript

example slurm script to use gpu[edit]

save the following as (for example) testjob.slurm

#!/bin/bash
#SBATCH -p gpu 
#SBATCH --gres=gpu:1 
#SBATCH -c 4
echo CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
python3 tens.py

then make it executable:

$ chmod +x testjob.slurm

Submit this job like this:

$ module load cuda cudnn
$ sbatch testjob.slurm

Note: you can also specify gpu type, for example: (pick one)

#SBATCH --gres=gpu:pascal:1 
#SBATCH --gres=gpu:kepler:1

Note the use of CUDA_VISIBLE_DEVICES -- printing the value of this may help with debugging. You can use this to match what gpus your job uses to the system performance graphs.

check slurm status and resource availablity[edit]

sinfo: get basic node status and partition list
sinfo -O nodelist,partition,gres,features: get list of available generic resources

Example:

$ sinfo
LIMIT  NODES  STATE NODELIST
normal*      up   infinite      4   idle c0-[1-4]
gpu          up   infinite      1   idle c0-0
$ sinfo -O nodelist,partition,nodes,gres,features
NODELIST            GRES                AVAIL_FEATURES      
c0-1                (null)              zswap               
c0-0                gpu:kepler:2        gpu                 
$ sinfo -O partition,nodelist,cpusstate,gres

In this example, there are two kepler gpus available (request with --gres=gpu:1) in the gpu partition (-p gpu)

As a simple example:

sbatch -p gpu --gres=gpu:1 --wrap="nvidia-smi"

or

srun -p gpu --gres=gpu:1  nvidia-smi

To check on running jobs:

 squeue

To cancel a job,

 scancel jobid

where jobid can be found in the output from squeue

To view information about completed jobs:

 sacct

To view information about old jobs (for instance, jobs since Jan 2):

 sacct --start=0102

To view all information about a specific job by number: (replace ### with job number)

 sacct -j ### -o all -P | strans | less

The strans command (coupled with the sacct -p option) transposes the table to make it more readable. You can also try listing specific fields you want to see with sacct -o field,list or try spformat which separates out common fields and resizes column widths. (Also try spformat -fK to split into multiple small tables.)

For detailed information about a job that is either currently running or waiting to run, supply its jobid:

 scontrol show job 123

interactive example[edit]

Note: interactive jobs are discouraged. Resources requested by interactive sessions are unavailable to other users until the session closes. Abuse of this will cause limits to be placed on interactive sessions.

srun --pty -I

or if you want to run bash (login shell) interactively on a node:

srun --pty -I bash

Slurm options[edit]

These can be put on the command line of sbatch or srun or added in your batch script prefixed with #SBATCH

This is a few of the interesting options, for a complete list, check the sbatch man page.

Please note: slurm options are somewhat sensitive to order!

The partition option should be first ( -p gpu )
QOS options should be next
GPU requests should be after QOS
In srun, the command to run should be last with all slurm options before it.

Options that select resource allocation permissions (such as partition, qos) need to be early in the options.

common options[edit]

-c #: allocate # cpus per task (default=1)
-C features: request a node with special features; run sinfo -O features for a list of available features.
--exclusive: request an exclusive node rather than sharing the node with other jobs
--mem=#: memory needed per node
--mem-per-cpu=#: provide a minimum amount of memory per cpu (most clusters default to 8G)
--cpus-per-task=#: request # cpus for the job task (Note: number can be a range)
--nodes=#: allocate at least # nodes for the job (but see below)
-p partition: run in a specific partition instead of the default partition (list partitions with sinfo; some clusters have a gpu partition)
--wrap="command": wrap a command in a shell script instead of specifying the script
-C 'feature|feature...': request features available on some nodes. see sinfo -O features for a cluster specific list of features available.

gpu use[edit]

Use these slurm options to request gpus. As with other options, they can either be put on the sbatch command line or in your script.

--gres=gpu:1: request one gpu
--gres=gpu:pascal:1: request one pascal gpu (see sinfo -O gres)
-p gpu: select the gpu partition (needed on clusters with both gpu nodes and cpu only nodes, check with sinfo )

Deprecated slurm options[edit]

We do not recommend the following options because the defaults work well:

--cpus-per-gpu: DO NOT USE THIS OPTION. IT DOESN'T WORK IN THE CURRENT VERSION OF SLURM!
--output=filename_pattern: The default is to use 'slurm-%j.out' or 'slurm-%A_%a.out' which includes the jobid in the output filename, which makes it easy to match errors to failed jobs and saves a separate output log for each run. If you do change this option, make sure that it results in a unique name for the job in a writable directory.
--error=filename_pattern: If this option is used, errors from jobs will be saved in a separate file. Usually it is easier to use the default, which saves errors with the job output.
--mail: Because some users abused this option with large array jobs, causing excessive burden on the campus mail system, this option is disabled on most clusters. If you need external notification of job completion, talk to us and something can be arranged. Use this carefully on clusters that still have it enabled.
--nodes=#: This option is disabled on some clusters because it requires multiple node support within the code and some users inadvertently use it on jobs that only support single nodes. If you know your code works on multiple nodes and supports slurm host lists, contact us and it can be enabled for your account.
--nodelist: Please do not force slurm to use specific nodes. Slurm automatically picks the best node by default. Use of this option without compelling justification will cause it to be disabled. Please use feature lists (-C), or other resource requests instead. (Ask for the features you need and we'll add them.)
--exclude=nodelist: This option can be used if your job crashes on specific nodes; however slurm usually takes nodes offline when this occurs. If there is a problem with a node, please let us know ASAP so that it can be fixed rather than just excluding it in every job. If this feature is abused, it will be disabled.

Array jobs[edit]

Array jobs are a batch queue feature for embarrassingly parallel tasks where parallelization is trivial. For instance, if you need to run the same program on 100 different input files, you could create a script called myjob.sh containing:

myjob testcase-$SLURM_ARRAY_TASK_ID

and submit it like this:

 sbatch -a 1-100 myjob.sh

The program myjob would be run repeatedly with $SLURM_ARRAY_TASK_ID replaced with the numbers 1, 2, 3, ... 100

If you wanted to skip numbers, you could do something like this:

sbatch -a 4-20:3 myjob.sh

which would run jobs with $SLURM_ARRAY_TASK_ID set to 4, 7, 10, 13, 16, 19

If you have a large list of files to process that are not numbered sequentially, you could save the list of files to a file and extract it like this:

% ls datadir > filelist.txt
% wc -l filelist.txt
24532

(This list has 24532 files in it)

% sbatch -a 1-24532 processfiles.sh

In your job, you can extract the name like this:

#!/bin/bash
taskfile=`sed -n "${SLURM_ARRAY_TASK_ID}p" filelist.txt`
yourprogram $taskfile

If some of these jobs failed and you want to rerun them, you can use -a and list the tasks (and subranges of tasks) separated with commas.

Environment variables[edit]

For a complete list, check the sbatch man page.

SLURM_JOB_ID: The ID of the job allocation.
SLURM_ARRAY_TASK_ID: the current tasks's array index
SLURM_RESTART_COUNT: number of times this job has been restarted and requeued; use this to detect if you need to do something to restore a previous saved state
SLURM_CPUS_PER_TASK

Diagnostics and error codes[edit]

Jobs may fail at different stages for various reasons.

If you get an error during job submission, either you have a syntax error in the parameters or you have asked for resources that will never be available. Ask for help if this is not obvious to you.

If your job is submitted successfully but then disappears from the queue, it probably finished (successfully or unsuccessfully) very quickly. Slurm records basic accounting data and directs application output and error messages to one or two log files (as per job options). If the job fails quickly and log files are not written, most likely reason for a job failure is that you tried to start it in a directory where the log file can't be written, or you are over disk quota.

You can use the sacct command to check the status and post mortem statistics of a job.

The State column indicates slurm errors. Sometimes the reason column gives more details. The Exit code column shows an application specific numeric error code.

$ sacct -j 123456
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
123456            myjob        gpu     group3          8 OUT_OF_ME+    0:125 
123456.batch      batch                group3          8 OUT_OF_ME+    0:125 
123456.exte+     extern                group3          8  COMPLETED      0:0

or use one of the wrappers for sacct ( sacct-diag sacct-mem )

$ sacct-diag -j 123456
NodeList=c3-1
Start=2024-04-20T09:10:01
End=2024-04-20T11:47:08
Elapsed=02:37:07

*******************************************************************************
User JobID ExitCode State Reason JobName
User    JobID         ExitC State         Reas JobName
ssd     123456        0:125 OUT_OF_MEMORY None myjob
        123456.batch  0:125 OUT_OF_MEMORY      batch 
        123456.extern 0:0   COMPLETED          extern

$ sacct-mem -j 123456
JobID        User          State All  ReqMem MaxVMSi NodeLi               Start 
------------ -------- ---------- --- ------- ------- ------ ------------------- 
123456            ssd OUT_OF_ME+   8  61.72G         c2-1   2024-04-20T09:10:01 
123456.batch          OUT_OF_ME+   8          61.14G c2-1   2024-04-20T09:10:01 
123456.exte+           COMPLETED   8               0 c2-1   2024-04-20T09:10:01

Note that the MaxVMSize column is sampled, so it may not actually include the highest value.

Slurm displays the exit code separated into exit:signal. The exit value is generally application specific but a few common exit code meanings are: (These may or may not be relevant, since the application controls this.)

value	meaning
0	success
nonzero	failure, slurm will mark job as FAILED
1	general failure
2	incorrect use of shell builtin (bash) command
125	out of memory
126	command cannot execute (bash)
127	command not found (bash)
128	invalid argument to exit (bash)

Any other exit code (and some of the above) may be applicatin specific. This is the value passed to exit() by the application.

Some common signals that may cause an application to exit are listed here. For a complete list of signals, see man -s 7 signal.

2	equivalent of ctrl-c
6	application detected critical error and called abort()
8	floating point error
7 11	memory access error (your code is buggy)
9 15	slurm probably killed this job; canceled by user or time expired?
53	Failed to write output file (check quota and directory permissions)
125	out of memory

Help:Slurm

Contents

Examples[edit]

batch example[edit]

example slurm script to use gpu[edit]

check slurm status and resource availablity[edit]

interactive example[edit]

Slurm options[edit]

common options[edit]

gpu use[edit]

Deprecated slurm options[edit]

Array jobs[edit]

Environment variables[edit]

Diagnostics and error codes[edit]

Navigation menu

Help:Slurm

Examples[edit]

batch example[edit]

example slurm script to use gpu[edit]

check slurm status and resource availablity[edit]

interactive example[edit]

Slurm options[edit]

common options[edit]

gpu use[edit]

Deprecated slurm options[edit]

Array jobs[edit]

Environment variables[edit]

Diagnostics and error codes[edit]

Navigation menu

Search