SCRP nodes have three R installations:

Basics

RStudio via JupyterHub is the recommended access method.

  1. Navigate to one of the following URLs on a browser:

  2. Choose one of the following under Notebook:
    • RStudio (...) [↗] launches RStudio running the R version specified in parenthesis.
    • R (...) creates a new R jupyter notebook.

Installing Additional Packages

You can install additional packages with the following command in R:

install.packages("package_name")

Downloaded packages are placed under your home directory and will be immediately available on all nodes.

Advanced Usage

To launch R in console mode on a login node, type in a terminal:

R

To run R in batch mode:

Rscript file_path

To launch MRO in console mode:

MRO

To run MRO in batch mode:

MROscript file_path

Running on a Compute Node - Short Duration

You should run your job on a compute node if you need more processing power. R installations on SCRP are linked to either OpenBLAS or Intel Math Kernel Library (MKL), which are capable of utilizing multiple CPU cores. You can speed up your analysis further by coding with parallelization in mind. There are many guides on how to do so available online, for example here and here.

RStudio and Jupyter Notebooks

To run RStudio or Jupyter notebooks on a compute node, following the instructions here.

R Console and R Scripts

To run R on a compute node in a terminal, simply prepend compute:

# Interactive console mode
compute R

# Batch mode. 
compute Rscript file_path

Both launch R on a compute node with four logical CPUs and 8GB of RAM, for a duration of 24 hours.

You can request more logical CPUs with the -c option, more memory with the --mem option and more time with the t option. For example, to request 16 CPUs and 40G of memory for three days:

compute -c 16 --mem=40G -t 3-0 R

See compute for a full list of options, or srun and sbatch for maximum flexibility.

Running on a Compute Node - Long Duration

All of the above options will terminate R when you close the terminal. There are two options if you do not want this to happen:

  • Use sbatch. First create a script, hypothetically named my_job.sh:

    #!/bin/bash
    #SBATCH --job-name=my_sid
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=2
    Rscript file_path
    

    The #SBATCH comments specify various options. In this example, we are requesting two logical CPUs for a single task.

    Now submit your job:

    sbatch my_job.sh
    

    Subject to available resources, your code will run even if you disconnect from the cluster. The maximum job duration is 5 days.

  • Use linux screen. Do note that we reserve the right to terminate processes that have been running for more than 24 hours on the login nodes.