Using R on HPC Clusters

Using R on HPC Clusters

This training tutorial helps users learn a basic workflow for how to use R on an HPC cluster. The tutorial will focus on parallel computing as a means to speed up R scripts on a cluster computer. Many packages in R offer some form of parallel computing yet they rely on a much smaller set of underlying approaches: multithreading in compiled code, the unix fork, and MPI. The tutorial will take a narrow path to focus on packages that directly engage the underlying approaches, yet are easy to use at a high-level.

Objectives 

  1. Learn how to work with GitHub in RStudio. Create a GitHub (or ORNL GitLab) account, create a repository and practice how to work with it from RStudio. Many tutorials are available on the web, for example by RStudio. 
  2. Learn a few basic UNIX commands for listing files, creating a directory, removing files, etc. Lots of places to learn, for example, Unix Shell Crash Course

Tutorial workflow:

We will run R as batch jobs on the clusters. The workflow will be:

Edit your code in RStudio -> push the code to GitHub/GitLab -> pull the code to the cluster and submit as batch -> look at your output and circle back to Edit.

This has the advantage of editing in a familiar environment and running in a common teaching environment. Other workflows are possible if you already know the tools.

Topics covered:

Day 1: Wednesday, July 13, 9:00am-12:00pm

Parallel hardware and software overview and ways to use multiple cores on a single node: mclapply (fork), multithreaded BLAS

Click here to join the Wednesday meeting

Day 2: Monday, July 18, 9:00am-12:00pm

Hardware review and using multiple nodes: MPI at high level via pbdMPI, matrix methods via kazaam and pbdDMAT

Click here to join the Monday meeting

Prerequisites:

  • Users who have access to CADES may do the exercises on CADES. Participants who do not have access to CADES may attend the lectures but will not have access to CADES. ORNL staff who do not already have access to CADES may obtain access to CADES through their UCAMS ID by activating it in the XCAMS system and then joining the birthright condo:
    • To active your UCAMS ID as your XCAMS CADES user ID, visit https://xcams.ornl.gov and click on “I need an account”.
    • Then, after accepting the user agreement, look in the tan box on the right of the screen under UCAMS, click on “activate your XCAMS account” and complete the required steps
      Join the Birthright condo: Navigate to https://xcams.ornl.gov/xcams/groups/cades-birthright. The preceding link will pre-fill a request for CADES Birthright Cloud and SHPC Condo resources. Access will be automatically approved.
      Once you are a member of CADES log in:

      • For Windows use putty or powershell with or-slrum-login.ornl.gov and your CADES user ID and password
      • For Linux/MAC users use ssh in the command line/ Terminal and then enter your CADES password.   ssh @or-slurm-login.ornl.gov
  • Users must have R and R studio installed on their laptops.
  • Participants must have git installed on their laptops
    • https://www.linode.com/docs/guides/how-to-install-git-on-linux-mac-and-windows/
  • Participants must be able to ssh to a remote machine

Registration

Using R on HPC Clusters

Registration
Name(Required)