R
SCRP nodes have two R installations:
- R 4.4.1 linked with OpenBLAS 0.3.21 and managed through CRAN - list of installed packages
- R 4.3.3 linked with Intel Math Kernel Library 2023.1 and managed through Conda - list of installed packages.
Option 1: JupyterHub
RStudio via JupyterHub is the recommended access method.
- Navigate to one of the following URLs on a browser:
- Choose one of the following under Notebook:
RStudio (...) [↗]
launches RStudio running the R version specified in parenthesis.R (...)
creates a new R jupyter notebook.
Option 2: Remote Desktop
- Connect to a login node through remote desktop.
- Launch from the pull-down menu on the top-right corner, Applications > Statistics > RStudio.
Option 3: SSH
All the instructions below assume you have connected to a login node through SSH. See Account and Access for details.
Launch RStudio on a login node:
# R4.3 with OpenBLAS and CRAN
rstudio
# R4.2 with MKL and Conda
rstudio-conda
Launch R in console mode on a login node:
# R4.3 with OpenBLAS and CRAN
R
# R4.2 with MKL and Conda
R-conda
Run R in batch mode:
# R4.3 with OpenBLAS and CRAN
Rscript file_path
# R4.2 with MKL and Conda
Rscript-conda file_path
Installing Additional Packages
The two versions of R have different package management mechanism. Regardless of which version you use, downloaded packages are placed under your home directory and will be immediately available on all nodes.
R4.3 with CRAN
You can install additional packages with the following command in R:
install.packages("package_name")
R4.2 with Conda
You can install additional packages with the following command in a terminal:
conda activate r
mamba install -c conda-forge package_name
Conda R packages are usually named with ‘r-‘ plus their original R package name. You can search for conda packages on Anaconda’s website.
Running on a Compute Node - Short Duration
You should run your job on a compute node if you need more processing power. R installations on SCRP are linked to either OpenBLAS or Intel Math Kernel Library (MKL), which are capable of utilizing multiple CPU cores. You can speed up your analysis further by coding with parallelization in mind. There are many guides on how to do so available online, for example here and here.
Jupyter
To run RStudio or notebooks through Jupyter on a compute node, follow the instructions here.
Remote Desktop
To run RStudio on a compute node in remote desktop, launch Applications > Slurm (x cores) > RStudio, where x is the number of desirable cores.
SSH
To run R on a compute node in a terminal, simply prepend compute
:
# RStudio
compute rstudio
# Interactive console mode
compute R
# Batch mode.
compute Rscript file_path
The above commands will launch R on a compute node with four logical CPUs and 8GB of RAM, for a duration of 24 hours.
You can request more logical CPUs with the -c
option, more memory with the --mem
option
and more time with the t
option.
For example, to request 16 CPUs and 40G of memory for three days:
compute -c 16 --mem=40G -t 3-0 R
See compute
for a full list of options,
or srun
and sbatch
for maximum flexibility.
Running on a Compute Node - Long Duration
All of the above options will terminate R when you close the terminal. There are two options if you do not want this to happen:
-
Use
sbatch
. First create a script, hypothetically named my_job.sh:#!/bin/bash #SBATCH --job-name=my_sid #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 Rscript file_path
The
#SBATCH
comments specify various options. In this example, we are requesting two logical CPUs for a single task.Now submit your job:
sbatch my_job.sh
Subject to available resources, your code will run even if you disconnect from the cluster. The maximum job duration is 5 days.
-
Use linux
screen
. Do note that we reserve the right to terminate processes that have been running for more than 24 hours on the login nodes.