Job submission

Batch scripts¶

In order to create a resource allocation and launch tasks you need to submit a batch script. A batch script should contain job specifications such as the partition, number of nodes, number of cores, walltime needed.

To submit a job, use the sbatch command.

sbatch job_script.sub

SLURM job specifications¶

Submitting jobs to the scheduling system is probably the most important part of the scheduling system. Because you need to be able to express the requirements of your job so that it can be properly scheduled, is also the most complicated. For SLURM, the lines specifying the job requirements should begin with #SBATCH.

The following tables translates dome of the most commonly used job specifications:

Description	Job Specification
Job Name	`--job-name=<job_name>` or `-J <job_name>`
Partition/Queue	`--partition=<queue_name>` or `-p <queue_name>`
Account/Project	`--acount=<account_name>` or `-A <account_name>`
Number of nodes	`--nodes=<number_of_nodes>` or `-N <number_of_nodes>`
Number of cores (tasks) per node	`--ntasks-per-node=<number_of_tasks>`
Walltime Limit	`--time=<timelimit>` or `-t <timelimit>`
Number of GPUs per node	`--gres=gpu:<number_of_gpus>` or `--gpus-per-node=<number_of_gpus>`
Number of GPUs per job	`--gpus=<number_of_gpus_per_job>`
Memory requirements per job	`--mem=<memory_in_MB>`
Memory requirements per core	`--mem-per-cpu=<memory_in_MB>`
Memory requirements per GPU	`--mem-per-gpu=<memory_in_MB>`
Standard Output FIle	`--output=<filename>` or `-o <filename>`
Standard Error File	`--error=<filename>` or `-e <filename>`
Combine stdout/stderr	use `-o` without `-e`
Email Address	`--mail-user=<email_address>`
Email Type	`--mail-type=NONE, BEGIN, END, FAIL, REQUEUE, ALL`
Exclusive Job not sharing resources	`--exclusive`

SLURM Environment Variables¶

SLURM provides environment variables for most of the values used in the #SBATCH directives.

Environment Variable	Description
`$SLURM_JOBID`	Job ID
`$SLURM_JOB_NAME`	Job Name
`$SLURM_SUBMIT_DIR`	Submit directory
`$SLURM_SUBMIT_HOST`	Submit host
`$SLURM_JOB_NODELIST`	Node list
`$SLURM_JOB_NUM_NODES`	Number of nodes allocated to job
`$SLURM_CPUS_ON_NODE`	Number of cores per node
`$SLURM_NTASKS_PER_NODE`	Number of tasks requested

Interactive access¶

Users also have the option to get interactive access on the compute nodes using the salloc command.

$salloc -N <number_of_nodes> --ntasks-per-node= <number_of_cores_per_node> -t <timelimit>

When you want to exit an interactive job, before it has reached its timelimit just type exit.

Submitting CPU jobs¶

When running under the CPU partition you need to specify the number of tasks per node (number of cores per node). If not specified, the default value is 1. In your job script or salloc command you need the option --ntasks-per-node=<number_of_cores_per_node>.

For example, to submit a job using two cores on one node in your job script you should have

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node-2

The equivalent command for interactive access is

salloc -N1 --ntasks-per-node=2

To submit a job using four whole Cyclone nodes (40 cores per node), in your job script you should have

#SBATCH --nodes=4 
#SBATCH --ntasks-per-node=40

The equivalent command for interactive access is

salloc -N4 --ntasks-per-node=40

Submitting GPU jobs¶

When running under the GPU partition you need to specify the number of gpus per node or per job.

In your job script or salloc command you need one of the below options:

--gres=gpu:<number_of_gpus_per_node>
--gpus-per-node=<number_of_gpus_per_node>
--gpus=<number_of_gpus_per_job>

For example, to submit a job using two GPUs on one node in your job script you should have

#SBATCH --nodes=1 
#SBATCH --gres=gpu:2

The equivalent command for interactive access is

salloc -N1 -gres=gpu:2

To submit a job using all GPUs on two nodes on Cyclone (4 GPUs per node), in your job script you should have

#SBATCH --nodes=4 
#SBATCH --gpus=8

or

#SBATCH --nodes=4 
#SBATCH --gpus-per-node=4

The equivalent command for interactive access is

salloc -N4 --gpus=8

or

salloc -N4 --gpus-per-node=4

Memory allocation¶

There are default memory amounts per core and per GPU. A user can ask for less or more memory per core or per GPU but if not explicitly asked, the default amounts will be given to each job.

The table below shows the default memory amounts per HPC system.

Cyclone Partition	Default memory per core (MB)	Default memory per GPU (MB)
cpu	4800	N/A
gpu	4800	48000
milan	2000	N/A
p100	2000	62000
nehalem	4000	N/A
a100	10000	124000
skylake	2000	N/A

The above defaults can change using the --mem-per-cpu=<size[units]> or --mem-per-gpu=<size[units]> options respectively in your salloc command or in your job script.

For example, a CPU job on a Cyclone node needing 10 cores and 2000 MB of memory per core, can be allocated as below:

salloc -N1 --ntasks-per-node=10 --mem-per-cpu=2000

Time limit¶

The default time limit for a job is 1 hour. If --time is not specified in SLURM, then the submitted job will run only for one hour and then be killed. The maximum time a job can run for is 24 hours.

X11 enabled jobs¶

X11 is now enabled through SLURM. Users can get interactive access on a compute node with the possibility of running programs with a graphical user interface (GUI) directly on the compute node. To achieve that the –x11 flag needs to be given for example:

salloc -N1 --x11

Note that for X11 to work on the compute nodes, users need to login to the cluster with X11 forwarding enabled. For example for Cyclone:

ssh -X username@cyclone.hpcf.cyi.ac.cy

MPI jobs with srun¶

Instead of using the mpirun or mpiexec commands you can launch MPI jobs using the SLURM srun command.

When using OpenMPI you can simply use the srun command instead of the mpirun command: For example to run an executable called my_exec on 40 cores with mpirun we would type:

mpirun -np 40 ./my_exec

Using srun you should type:

srun -n 40 ./my_exec
When using IntelMPI you first need to set the I_MPI_PMI_LIBRARY and then use srun:

export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so srun -n 40 ./my_exec

To use mpirun when you have already set the I_MPI_PMI_LIBRARY variable, you first need to unset it:

unset I_MPI_PMI_LIBRARY