Skip to content

Job submission

Job submission

Batch scripts

In order to create a resource allocation and launch tasks you need to submit a batch script. A batch script should contain job specifications such as the partition, number of nodes, number of cores, walltime needed.

To submit a job, use the sbatch command.

sbatch job_script.sub

SLURM job specifications

Submitting jobs to the scheduling system is probably the most important part of the scheduling system. Because you need to be able to express the requirements of your job so that it can be properly scheduled, is also the most complicated. For SLURM, the lines specifying the job requirements should begin with #SBATCH.

The following tables translates dome of the most commonly used job specifications:

Description Job Specification
Job Name --job-name=<job_name> or -J <job_name>
Partition/Queue --partition=<queue_name> or -p <queue_name>
Account/Project --acount=<account_name> or -A <account_name>
Number of nodes --nodes=<number_of_nodes> or -N <number_of_nodes>
Number of cores (tasks) per node --ntasks-per-node=<number_of_tasks>
Walltime Limit --time=<timelimit> or -t <timelimit>
Number of GPUs per node --gres=gpu:<number_of_gpus> or --gpus-per-node=<number_of_gpus>
Number of GPUs per job --gpus=<number_of_gpus_per_job>
Memory requirements per job --mem=<memory_in_MB>
Memory requirements per core --mem-per-cpu=<memory_in_MB>
Memory requirements per GPU --mem-per-gpu=<memory_in_MB>
Standard Output FIle --output=<filename> or -o <filename>
Standard Error File --error=<filename> or -e <filename>
Combine stdout/stderr use -o without -e
Email Address --mail-user=<email_address>
Email Type --mail-type=NONE, BEGIN, END, FAIL, REQUEUE, ALL
Exclusive Job not sharing resources --exclusive

SLURM Environment Variables

SLURM provides environment variables for most of the values used in the #SBATCH directives.

Environment Variable Description
$SLURM_JOBID Job ID
$SLURM_JOB_NAME Job Name
$SLURM_SUBMIT_DIR Submit directory
$SLURM_SUBMIT_HOST Submit host
$SLURM_JOB_NODELIST Node list
$SLURM_JOB_NUM_NODES Number of nodes allocated to job
$SLURM_CPUS_ON_NODE Number of cores per node
$SLURM_NTASKS_PER_NODE Number of tasks requested

Interactive access

Users also have the option to get interactive access on the compute nodes using the salloc command.

$salloc -N <number_of_nodes> --ntasks-per-node= <number_of_cores_per_node> -t <timelimit>

When you want to exit an interactive job, before it has reached its timelimit just type exit.

Submitting CPU jobs

When running under the CPU partition you need to specify the number of tasks per node (number of cores per node). If not specified, the default value is 1. In your job script or salloc command you need the option --ntasks-per-node=<number_of_cores_per_node>.

For example, to submit a job using two cores on one node in your job script you should have

#SBATCH --nodes=1 
#SBATCH --ntasks-per-node-2

The equivalent command for interactive access is

salloc -N1 --ntasks-per-node=2

To submit a job using four whole Cyclone nodes (40 cores per node), in your job script you should have

#SBATCH --nodes=4 
#SBATCH --ntasks-per-node=40

The equivalent command for interactive access is

salloc -N4 --ntasks-per-node=40

Submitting GPU jobs

When running under the GPU partition you need to specify the number of gpus per node or per job.

In your job script or salloc command you need one of the below options:

  • --gres=gpu:<number_of_gpus_per_node>
  • --gpus-per-node=<number_of_gpus_per_node>
  • --gpus=<number_of_gpus_per_job>

For example, to submit a job using two GPUs on one node in your job script you should have

#SBATCH --nodes=1 
#SBATCH --gres=gpu:2

The equivalent command for interactive access is

salloc -N1 -gres=gpu:2

To submit a job using all GPUs on two nodes on Cyclone (4 GPUs per node), in your job script you should have

#SBATCH --nodes=4 
#SBATCH --gpus=8

or

#SBATCH --nodes=4 
#SBATCH --gpus-per-node=4

The equivalent command for interactive access is

salloc -N4 --gpus=8

or

salloc -N4 --gpus-per-node=4

Memory allocation

There are default memory amounts per core and per GPU. A user can ask for less or more memory per core or per GPU but if not explicitly asked, the default amounts will be given to each job.

The table below shows the default memory amounts per HPC system.

Cyclone Partition Default memory per core (MB) Default memory per GPU (MB)
cpu 4800 N/A
gpu 4800 48000
milan 2000 N/A
p100 2000 62000
nehalem 4000 N/A
a100 10000 124000
skylake 2000 N/A

The above defaults can change using the --mem-per-cpu=<size[units]> or --mem-per-gpu=<size[units]> options respectively in your salloc command or in your job script.

For example, a CPU job on a Cyclone node needing 10 cores and 2000 MB of memory per core, can be allocated as below:

salloc -N1 --ntasks-per-node=10 --mem-per-cpu=2000

Time limit

The default time limit for a job is 1 hour. If --time is not specified in SLURM, then the submitted job will run only for one hour and then be killed. The maximum time a job can run for is 24 hours.

X11 enabled jobs

X11 is now enabled through SLURM. Users can get interactive access on a compute node with the possibility of running programs with a graphical user interface (GUI) directly on the compute node. To achieve that the –x11 flag needs to be given for example:

salloc -N1 --x11

Note that for X11 to work on the compute nodes, users need to login to the cluster with X11 forwarding enabled. For example for Cyclone:

ssh -X username@cyclone.hpcf.cyi.ac.cy

MPI jobs with srun

Instead of using the mpirun or mpiexec commands you can launch MPI jobs using the SLURM srun command.

  • When using OpenMPI you can simply use the srun command instead of the mpirun command: For example to run an executable called my_exec on 40 cores with mpirun we would type:

    mpirun -np 40 ./my_exec

    Using srun you should type:

    srun -n 40 ./my_exec

  • When using IntelMPI you first need to set the I_MPI_PMI_LIBRARY and then use srun:

    export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so srun -n 40 ./my_exec

    To use mpirun when you have already set the I_MPI_PMI_LIBRARY variable, you first need to unset it:

    unset I_MPI_PMI_LIBRARY