Contents

  1. Access and accounts
  2. TANGO system software and logon
  3. Packages available
  4. Submission scripts
  5. Running jobs
  6. Data storage and backups
  7. Contacts and help

1. Access and accounts

Through eRSA, time on the cluster is available to researchers from any of the South Australian universities. Researchers at these universities who wish to use any of eRSA’s facilities should complete the membership form.

Anyone else who is interested in using eResearch SA’s facilities should consult the Conditions of Use to determine how best to gain access to the machine.


2. TANGO system software and logon

The TANGO head node runs on a CentOS 7.3 operating system and uses the Slurm Workload Manager.

To connect to TANGO, you need to use the Unix/Linux command line. Try this cheat sheet to get you started.

Windows

Use a program called Putty

  1. Download the putty program - choose the MSI "Windows installer", or just putty.exe
  2. Once you have putty installed, follow this guide on connecting to TANGO

Linux and Mac

ssh tango.ersa.edu.au
ssh USERNAME@tango.ersa.edu.au


Note:  USERNAME  is your eRSA HPC username.


3. Packages available

Modules (libraries and application software)

As with previous eRSA HPCs, TANGO uses modules to configure the user environment to provide access to software packages. This provides much easier access to the packages on the system. Researchers who have used Tizard will find the process much the same.

The system uses the "lmod" system to load/unload software. Please refer to the user guide here.

In the command line, to see what modules are available to be loaded (i.e. which applications are available on the cluster), type:

module avail

You can also see which modules you currently have loaded by typing:

module list

Similarly, you can unload modules using “unload module”; for example:

module unload gaussian

will unload the Gaussian module, removing all references to the Gaussian executable and its associated runtime libraries.

If you do not see a module listed for the application that you wish to run, please contact the eRSA Service Desk.

Compilers and parallel programming libraries

The following compilers are available on TANGO, and easily accessible once you have loaded the correct module (refer to earlier section for information on modules):

  • Intel Compiler Suite
  • GNU Compiler (GCC, GFortran)
  • Java, Python, Perl, Ruby
  • OpenMPI - library for MPI message passing for use in parallel programming over Infiniband and Ethernet

Please see the guide to Compiling Programs on TANGO for further details.


4. Submission scripts

The Slurm Workload Manager is used for queuing batch jobs on TANGO. A batch job is sent to the system (submitted) with SBATCH, and comments at the start of the submission script which match a special pattern ( #SBATCH ) are read as Slurm options.

There are two aspects to a batch jobscript:

  1. A set of SBATCH directives describing the resources required and other information about the job; and
  2. The script itself, comprised of commands to set up and perform the computations without additional user interaction.

You will find two SLURM submission script templates in your home directory under a folder called  .templates :

ls -lar .templates/

-rw-r--r-- 1 Owner-Acct Group-Name  526 Aug 14 10:57 tango.sub

-rw-r--r-- 1 Owner-Acct Group-Name 1011 Sep 14 13:05 tango-scratch.sub

For running batch jobs using scratch space with the tango-scratch.sub  jobscript refer to the Scratch on TANGO user guide.

Copy  tango.sub  to your work folder, containing the data to be copied to scratch, and modify it as required using a command line text editor (e.g. nano or vim ):

cd ~/WorkDir/WorkSubDir/

cp ~/.templates/tango.sub myscript.sub

nano myscript.sub

Template jobscript:

#!/bin/bash


### Job Name

#SBATCH --job-name=MyJobName


### Set email type for job

### Accepted options: NONE, BEGIN, END, FAIL, ALL

#SBATCH --mail-type=ALL


### email address for user

#SBATCH --mail-user=Your-email-Address


### Queue name that job is submitted to

#SBATCH --partition=tango


### Request nodes

#SBATCH --ntasks=1

#SBATCH --mem=Xgb

#SBATCH --time=HH:MM:SS


echo Running on host `hostname`

echo Time is `date`


#Load module(s) if required

module load application_module


# Run the executable

MyProgram+Arguments


Edit the highlighted jobscript entries as required for your specific job:

  • All lines beginning with  #SBATCH  are interpreted as SLURM commands directly to the queuing system;
  •  MyJobName  should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number);
  •  ntasks=X  requests the number of CPUs required for a job;
  •  mem=Xgb  states that the program will use at most X GB of memory;
  •  time=HH:MM:SS  states the amount of "hours:minutes:seconds" walltime (realised actual time) that your job will require at most. Please contact the Service Desk if you need more than 200 hours for your job; 
  •  module load   is required if you don’t automatically load the required module(s) (e.g. application or compiler) in this shell’s environment. Edit the module(s) name(s) at  application_module ; and
  •  MyProgram+Arguments  is the name of the program you want to run and all of the command line arguments you need. It may also include redirection of input and output streams.
  • Output and error messages will be joined into a file  slurm-XXXXX.out  which is placed in the directory from which the job was submitted (XXXXX will be the numerical Job ID which is allocated when you submit the job with sbatch).


5. Running jobs

Jobs on TANGO may be run in eitherbatch modeorinteractive mode

Batch jobs

Batch jobs are run on TANGO by submitting a jobscript to Slurm.

Jobs are submitted to the queue by issuing the command:

sbatch myscript

where myscript contains relevant Slurm commands and shell script commands.

Interactive jobs

Interactive jobs are typically used to step through code whilst debugging. In such cases, using only a small subset of your data reduces resource requirements and provides feedback more quickly.

There are two methods of running interactive sessions:

  1. If you're happy with the default resource allocation of 1 CPU with 4GB ram and a wall time of 1 hr, then you can open a bash shell on a compute node using srun:
  2. [auser@tango-head-01 ~]$ srun --pty bash 
    [auser@tango-14 ~]$ module load R/3.4.0 
    [auser@tango-14 ~]$ R 
    > (R environment loaded)


  3. If you need more resources you must request a resource allocation using salloc and then use srun to launch a python environment (or any other executable e.g. bash)
  4. [auser@tango-head-01 ~]$ salloc --nodes=2 --core-spec=8 --mem-per-cpu=32000 
    salloc: Granted job allocation 1396 
    [auser@tango-head-01 ~]$ squeue 
    JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 
    1397 tango tcsh auser R 1:26 2 tango-[03-04] 
    [auser@tango-head-01 ~]$ module load python/6.3.0/2.7.13 
    [auser@tango-head-01 ~]$ srun --jobid=1397 --pty python
    Python 2.7.13 (default, May 1 2017, 12:50:43) 
    [GCC 6.3.0] on linux2 
    Type "help", "copyright", "credits" or "license" for more information. 
    >>> import socket 
    >>> hostname = socket.gethostname() 
    >>> print hostname 
    tango-03 

     See srun --help and salloc --help for further details.

    Checking a job’s status in the queue

    Once a job has been submitted to the queue, it will print out a numerical Job ID. This number is helpful to make checks on the job’s status using the squeue command. Here is some sample output: 

    squeue 
    JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
    1234 tango TestSub auser R 2:42 1 tango-03 

    Deleting a queued job

    To delete a queued or running job, type: 

    scancel JobNumber

    Note: You will only be able to delete your own jobs.

    How much memory and virtual memory will I need?

  • Please refer to your software documentation on how much memory it will require. The amount of memory you need may depend on how many CPU cores your software uses.
  • You may like to run a smaller test job and check the mem and vmem usage in the output file that is generated. 

6. Data storage and backups

Temporary storage during computation

For working space during execution, it is recommended that you use the /scratch directory, which is shared across nodes.

More information on accessing the /scratch directory can be found in the Scratch on TANGO user guide

Long term storage

Please see the storage FAQ for details.


7. Contacts and help

For more information on eRSA’s facilities, systems support, assistance with parallel programming and performance optimisation and to report any problems, contact the eRSA Service Desk.

When reporting problems, please give as much information as you can to help us in diagnosis, for example:

  • When the problem occurred
  • What commands or programs you were trying to execute at the time
  • A copy of any error messages
  • A pointer to the program you were trying to run or compile
  • What compiler or Makefile you were using


Did you find this information useful? Share your feedback here.