SCRP nodes utilizes the Slurm Workload Manager to allocate compute node resources.

Partitions

Compute nodes are grouped into two partitions, which you can think of as queues:

Partition Nodes Resource per Node
scrp* scrp-node-[1-6,8,9] CPU: 16-64 cores
RAM: 64-512GB
GPU: 0-4
large scrp-node-[6,7] CPU: 64 cores
RAM: 512GB-1TB
a100 scrp-node-10 CPU: 32 cores
RAM: 512GB
GPU: A100 x 2

*Default partition.

The scrp partition is accessible to all users, while the large partition is only accessible to faculty members and research postgraduate students.

Resource Limits

To check what resources you can use, type in a terminal:

qos

The QoS (for “Quality of Service”) field tells you the resource limits applicable to you:

QoS Logical CPUs GPUs Max. Job Duration
c4g1 4 1 1 Day
c16g1 16 1 5 Days
c32g4 32/128* 4 5 Days
c32g8 32/128* 8 5 Days
c16-long 16/16* 1 30 Days

*For partitions scrp and large, respectively.

Default job duration is 1 day regardless of the QoS you use.

Compute Node Status

To check the status of the compute nodes, type:

scrp-info

Use Slurm’s sinfo if you want to see a different set of information.

Run a Job Immediately

Predefined Shortcuts

SCRP has two sets of predefined shortcuts to provide quick access:

  • compute-[1-4,8,16,32,64,128] requests the specified number of logical CPUs from a compute node. Each CPU comes with 2GB of RAM.
  • gpu-[1-4] requests 1-4 RTX 3060 GPUs from a compute node. Four logical CPUs and 24GB of RAM are allocated per GPU.

For example, to launch Stata with 16 logical CPUs and 32GB RAM, you can simply type:

compute-16 stata-mp

compute

For more control, you can use the compute command:

compute [options] [command]

Note:

  • Pseudo terminal mode is enabled. X11 forwarding is also enabled when a valid display is present.
  • Jobs requesting more than 32 cores are routed to the large partition, while jobs requesting A100 GPUs are routed to the a100 parititon.
  • Multithreading is disabled except for compute-128.
  • Automatically scales CPU core count and memory with GPU type unless user specify them.
    • RTX 3060: four CPU cores and 24GB RAM per GPU
    • RTX 3090: eight CPU cores and 48GB RAM per GPU
    • A100: eight CPU cores and 160GB RAM per GPU

Options:

  • -c requests a specific number of logical CPUs. Defaults to four CPU cores.
  • --mem requests a specific amount of memory.
  • --gpus-per-task request GPUs in one of the following formats:
    • number
    • model
    • model:number
  • -t sets the maximum running time of the job in one of the following format:
    • minutes
    • minutes:seconds
    • hour:minutes:seconds
    • day-hours
    • day-hours:minutes:seconds
  • -p requests a specific partition. Defaults to ‘scrp’.
  • -q requests a specific quality of service.
  • -w Request specify nodes.
  • -y Turn on multithreading. Only applicable to the large memory node.
  • -z Print the generated srun command.
  • command The command to run. Defaults to starting a bash shell.

Examples:

  1. To request 16 CPUs and 40G of memory:

    compute -c 16 --mem=40G stata-mp
    
  2. To request 8 CPUs, 40G of memory and one RTX 3090 GPU:

    compute -c 8 --mem=40G --gpus-per-task=rtx3090 python
    
  3. Faculty members can run jobs longer than five days by requesting the QoS c16-long. The maximum job duration is 30 days.

    compute -t 30-0 -c 16 --mem=40G stata-mp
    

srun

Finally, for maximum flexibility, you can use Slurm’s srun command. All the shortcuts above utilize srun underneath.

srun [options] command

Common Options:

  • --pty pseudo terminal mode. Required for any interactive software.
  • --x11 x11 forwarding. Required for software with GUI.
  • -c requests a specific number of logical CPUs.
  • --hint=[no]multithread specifies whether simultaneous multithreading (SMT) is needed. Only works on the large memory node. On that node, setting --hint=nomultithread is recommended unless you are certain that you code benefits from SMT.
  • --ntasks specifies the number of parallel tasks to run. Specifying ths option is generally not necessary unless you are running an MPI job or setting --hint=nomultithread. For the latter, setting --ntasks=1 prevent Slurm from erroneously starting multiple identical tasks.
  • --mem requests a specific amount of memory.
  • --gpus-per-task request GPUs in one of the following formats:
    • number
    • model
    • model:number
  • -t sets the maximum running time of the job in one of the following format:
    • minutes
    • minutes:seconds
    • hour:minutes:seconds
    • day-hours
    • day-hours:minutes:seconds
  • -p requests a specific partition.
  • -q requests a specific quality of service.

See Slurm’s srun documentation for additional options.

Examples:

  1. To request two logical CPUs, 10GB of RAM and one NVIDIA RTX 3090 GPU for 10 hours of interactive use, type:

    srun --pty -t 600 -c 2 --mem 10G --gpus-per-task=rtx3090:1 bash
    
  2. If you need 1TB memory, you will need to request the large memory node. Since that node is under the separate large partition, you need to specify the -p option:

    srun --pty -c 16 --hint=nomultithread --ntasks=1 --mem 1000G -p large stata-mp
    

    We are specifying the options --hint=nomultithread --ntasks=1 to disable simultaneous multithreading, so that -c 16 actually gives us 16 physical CPU cores. These options are only necessary on the large memory node where SMT is enabled.

  3. Faculty members and research postgraduate students can run jobs longer than five days by requesting the QoS c16-long. The maximum job duration is 30 days.

    srun --pty -t 30-0 -c 16 -q c16-long Rscript my_code.r
    

If Slurm is unable to allocate the resources you request, you will be put on a queue until resources are available or you terminate the command.

Run a Job in Batch Mode

sbatch allows you to submit a job without the need to wait for it to execute.

For example, to run Stata in batch mode on the a single compute node, we will need the following batch script:

#!/bin/bash
#SBATCH --job-name=my_sid
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2

stata-mp -b do file_path

Then submit your job:

sbatch [file_path_to_batch_script]

You can run multiple jobs concurrently in one sbatch request by using srun inside the request file. You will need to specify --ntask and possibly --cpus-per-task. For example:

#!/bin/bash
#SBATCH --job-name=my_sid
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1

srun stata-mp -b do file_path_1
srun Rscript file_path_2

If you omit srun in the above example, R will only run after Stata has completed.

Job Status

Use scrp-queue to check the status of your job:

# Your running jobs
scrp-queue

# Completed jobs
scrp-queue -t COMPLETED

If you want to see a different set of information, you can use Slurm’s squeue instead.

Cancel a Job

You can use scancel to cancel a job before it completes. First find the ID of your running job with squeue, then:

scancel job_id

Further Information

See Slurm’s sbatch documentation for additional options.