Our TANGO HPC Cluster comprises nodes from two locations, allowing us to provide dynamic resources to our researchers.

Scratch space provides the fastest and most accessible storage for high-performance computing jobs. Scratch space is dedicated for only temporary storage and should not be used as a filing or backup space. To provide the best job performance, each node has its own scratch space, with TANGO using VSAN and our big memory nodes (i.e. with greater than 256 GB RAM) using the Lustre filesystem. This means moving files to scratch should be handled inside the  tango-scratch.sub  SLURM job submission script. 

You will find the SLURM submission script template in your home directory under a folder called  .templates :

ls -lar .templates/

-rw-r--r-- 1 Owner-Acct Group-Name  526 Aug 14 10:57 tango.sub

-rw-r--r-- 1 Owner-Acct Group-Name 1011 Sep 14 13:05 tango-scratch.sub

Copy  tango-scratch.sub  to your work folder, containing the data to be copied to scratch, and modify it as required using a command line text editor (e.g. nano or vim ):

cd ~/WorkDir/WorkSubDir/

cp ~/.templates/tango-scratch.sub myExperiment-scratch.sub

nano myExperiment-scratch.sub

Template jobscript:

#!/bin/bash


### Job Name

#SBATCH --job-name=MyJobName


### Set email type for job

### Accepted options: NONE, BEGIN, END, FAIL, ALL

#SBATCH --mail-type=ALL


### email address for user

#SBATCH --mail-user=Your-email-Address


### Queue name that job is submitted to

#SBATCH --partition=tango


### Request nodes

#SBATCH --ntasks=1

#SBATCH --mem=Xgb

#SBATCH --time=HH:MM:SS


echo Running on host `hostname`

echo Time is `date`


# Copy job directory to scratch

mkdir -p /scratch/$USER/job-$SLURM_JOB_ID

rsync -avH --exclude=slurm-\*.out $SLURM_SUBMIT_DIR/ /scratch/$USER/job-$SLURM_JOB_ID/


# Go to the scratch directory to run the job from there

cd /scratch/$USER/job-$SLURM_JOB_ID/


#Load module(s) if required

module load application_module


# Run the executable

MyProgram+Arguments


# Copy job directory back to original directory and clean up scratch directory

rsync -avH --exclude=slurm-\*.out /scratch/$USER/job-$SLURM_JOB_ID/ $SLURM_SUBMIT_DIR/

cd $SLURM_SUBMIT_DIR/

rm -rf /scratch/$USER/job-$SLURM_JOB_ID


Edit the highlighted jobscript entries as required for your specific job:

  • All lines beginning with  #SBATCH  are interpreted as SLURM commands directly to the queuing system;
  •  MyJobName  should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number);
  •  ntasks=X  requests the number of CPUs required for a job;
  •  mem=Xgb  states that the program will use at most X GB of memory;
  •  time=HH:MM:SS  states the amount of "hours:minutes:seconds" walltime (realised actual time) that your job will require at most. Please contact the Service Desk if you need more than 200 hours for your job; 
  •  module load   is required if you don’t automatically load the required module(s) (e.g. application or compiler) in this shell’s environment. Edit the module(s) name(s) at  application_module ; and
  •  MyProgram+Arguments  is the name of the program you want to run and all of the command line arguments you need. It may also include redirection of input and output streams.
  • Output and error messages will be joined into a file  slurm-XXXXX.out  which is placed in the directory from which the job was submitted (XXXXX will be the numerical Job ID which is allocated when you submit the job with sbatch).

If parallel jobs (i.e. MPI jobs) are utilising multiple nodes, they must be confined to one physical data centre "dc" to ensure the use of the same local scratch space. In such instances, the same data centre must also be specified when submitting the job using  dc:pl  or  dc:ep 

Example:

sbatch -C "dc:pl" my_job.sub 

will connect to the lustre scratch,

while:

sbatch -C "dc:ep" my_job.sub 

 connects to VSAN.

 

 

Did you find this information useful? Share your feedback here.