lab:bmir_cluster

BMIR/Khatri lab Cluster

9 nodes total (shared with Shah lab)
4 of them are dedicated to Khatri lab
- Intel® Xeon® CPU E5-2680 v3 @ 2.50GHz
- 24 cores, hyperthreaded
- 384 GB of RAM (per node)
- 600 GB of disk ( $LOCAL_SCRATCH)

All jobs are sent via khatrilab-dev1.stanford.edu, no need to log into any particular machine. Cluster runs Simple Linux Utility for Resource Management (SLURM)
- Scheduler
- Compatible with Sherlock cluster
- Full control over CPU Memory usage
- Job Array support

We use environmental module system

> module avail
> module load R/3.2.0
> module list
> module unload R/3.2.0

srun

Example:

srun --partition=khatrilab hostname
srun --partition=khatrilab -N 3 hostname
srun --partition=khatrilab -N 3 -n 10 hostname

sbatch

Example - full sbatch script to load and run R

#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=alexskr@stanford.edu
#SBATCH --time=0-01:05 # Runtime in D-HH:MM
#SBATCH --job-name=sample_R_job
#SBATCH --nodes=1 # Ensure that all cores are reserved on one machine
#SBATCH -n 20 #number of cores to reserve, default is 1
#SBATCH --mem=4086 # in MegaBytes. default is 8 GB
#SBATCH --exclusive # exclusive access to nodes for the job.
#SBATCH --partition=khatrilab # Partition allocated for the lab
#SBATCH --error=log/job.%J.err
#SBATCH --output=result/job.%J.out
# Fist we need to enable R
module load R/3.2.0
# Run script
Rscript rscript_parallel.R

Other commands

scancel
squeu
sacct
sinfo

Interactive Sessions

[alexskr@khatrilab-dev1 ~]$ salloc -N 1 -n 20 --
mem=8066 --partition=khatrilab
salloc: Granted job allocation 684
[alexskr@khatrilab-dev1 ~]$ srun --pty bash
[alexskr@bmir-ct-1-1 ~]$
  • lab/bmir_cluster.txt
  • Last modified: 2016/08/18 13:38
  • by alexskr