R | SCRP CUHK Economics

SCRP nodes have two R installations:

R 4.4.2 linked with OpenBLAS 0.3.21 and managed through CRAN - list of installed packages
R 4.3.3 linked with Intel Math Kernel Library 2023.1 and managed through Conda - list of installed packages.

Option 1: JupyterHub

RStudio via JupyterHub is the recommended access method.

Navigate to one of the following URLs on a browser:
- https://scrp-login.econ.cuhk.edu.hk/jupyter
- https://scrp-login-2.econ.cuhk.edu.hk/jupyter
Choose one of the following under Notebook:
- RStudio (...) [↗] launches RStudio running the R version specified in parenthesis.
- R (...) creates a new R jupyter notebook.

Option 2: Remote Desktop

Connect to a login node through remote desktop.
Launch from the pull-down menu on the top-right corner, Applications > Statistics > RStudio.

Option 3: SSH

All the instructions below assume you have connected to a login node through SSH. See Account and Access for details.

Launch RStudio on a login node:

# R4.3 with OpenBLAS and CRAN
rstudio

# R4.2 with MKL and Conda
rstudio-conda

Launch R in console mode on a login node:

# R4.3 with OpenBLAS and CRAN
R

# R4.2 with MKL and Conda
R-conda

Run R in batch mode:

# R4.3 with OpenBLAS and CRAN
Rscript file_path

# R4.2 with MKL and Conda
Rscript-conda file_path

Installing Additional Packages

The two versions of R have different package management mechanism. Regardless of which version you use, downloaded packages are placed under your home directory and will be immediately available on all nodes.

R4.3 with CRAN

You can install additional packages with the following command in R:

install.packages("package_name")

R4.2 with Conda

You can install additional packages with the following command in a terminal:

conda activate r
mamba install -c conda-forge package_name

Conda R packages are usually named with ‘r-‘ plus their original R package name. You can search for conda packages on Anaconda’s website.

Running on a Compute Node - Short Duration

You should run your job on a compute node if you need more processing power. R installations on SCRP are linked to either OpenBLAS or Intel Math Kernel Library (MKL), which are capable of utilizing multiple CPU cores. You can speed up your analysis further by coding with parallelization in mind. There are many guides on how to do so available online, for example here and here.

Jupyter

To run RStudio or notebooks through Jupyter on a compute node, follow the instructions here.

Remote Desktop

To run RStudio on a compute node in remote desktop, launch Applications > Slurm (x cores) > RStudio, where x is the number of desirable cores.

SSH

To run R on a compute node in a terminal, simply prepend compute:

# RStudio
compute rstudio

# Interactive console mode
compute R

# Batch mode. 
compute Rscript file_path

The above commands will launch R on a compute node with four logical CPUs and 8GB of RAM, for a duration of 24 hours.

You can request more logical CPUs with the -c option, more memory with the --mem option and more time with the t option. For example, to request 16 CPUs and 40G of memory for three days:

compute -c 16 --mem=40G -t 3-0 R

See compute for a full list of options, or srun and sbatch for maximum flexibility.

Running on a Compute Node - Long Duration

All of the above options will terminate R when you close the terminal. There are two options if you do not want this to happen:

Use sbatch. First create a script, hypothetically named my_job.sh:
```
#!/bin/bash
#SBATCH --job-name=my_sid
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
Rscript file_path
```
The #SBATCH comments specify various options. In this example, we are requesting two logical CPUs for a single task.

Now submit your job:
```
sbatch my_job.sh
```
Subject to available resources, your code will run even if you disconnect from the cluster. The maximum job duration is 5 days.
Use linux screen. Do note that we reserve the right to terminate processes that have been running for more than 24 hours on the login nodes.