User Tools

Site Tools


aurora_cluster:running_on_aurora

This is an old revision of the document!


Running on the Aurora Cluster

In general, running on Aurora follows the rules stated on Lunarc manual pages. All jobs must be ran under the slurm batch system.

There's only few important things one needs to know:

  • HEP nodes are running in a special partition (a.k.a. queue) called hep. Whenever in the documentation you're asked to specify a queue, use hep
  • A set of nodes is selected by choosing a project and a reservation, one for each division. The HEP nodes do not require a special reservation flag to be accessed. The project names and reservation flags are listed in the table below:
Your division SLURM Partition Project String Reservation String call srun with
Nuclear Physics hep HEP2016-1-3 not needed
srun -p hep -A HEP2016-1-3 <scriptname>
Particle Physics hep HEP2016-1-4 not needed
srun -p hep -A HEP2016-1-4 <scriptname>
Theoretical Physics hep HEP2016-1-5 not needed
srun -p hep -A HEP2016-1-5 <scriptname> 
Mathematical Physics lu lu2016-2-10 lu2016-2-10
srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 <scriptname> 
  • Home folders are backed up by Lunarc.
  • HEP storage is a bit different from the others and there are a few additional rules to use it. Please read storage

Batch Scripts Examples

hep partition

A typical direct submission to hep nodes looks like:

srun -p hep -A HEP2016-1-4 myscript.sh

Here is an example of a typical slurm submission script slurmexample.sh written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:

#!/bin/bash
#
#SBATCH -A hep2016-1-4
#SBATCH -p hep 
#
hostname
srun echo $BASHPID;

The script is submitted to the SLURM batch queue with the command:

sbatch slurmexample.sh

The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.

lu partition

A typical direct submission to lu nodes looks like:

srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 myscript.sh

Here is an example of a typical slurm submission script slurmexample.sh written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:

#!/bin/bash
#
#SBATCH -A lu2016-2-10
#SBATCH -p lu
#SBATCH --reservation=lu2016-2-10
#
hostname
srun echo $BASHPID;

The script is submitted to the SLURM batch queue with the command:

sbatch slurmexample.sh

The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.

Interactive access to nodes for code testing

As said before, it is not possible to run your test code on the frontend on Aurora. But since everybody likes to test their code before submitting it to the batch system, slurm provides a nice way of using a node allocation as an interactive session, just like we were doing on pptest-iridium and nptest-iridium.

The interactive session is activated using the interactive command and a few options:

interactive -t 00:60:00 -p hep -A HEP2016-1-1
interactive -t 00:60:00 -p lu -A lu2016-2-10 --reservation=lu2016-2-10

where -t 00:60:00 is the time in hours:minutes:seconds you want the interactive session to last. You can put as much as you want in the timer. Mind that whatever you're running will be killed after the specified time.

slurm will select a free node for you and open a bash terminal. From that moment on you can pretty much do the same as you were doing on Iridium testing nodes.

The interactive session is terminated when you issue the exit command.

Loading libraries and tools on Aurora

Aurora allows the user to load specific versions of compilers and tools using the excellent module system. This system configures binary tools and library paths for you in a very flexible way.

Read more about it in Lunarc documentation here: http://lunarc-documentation.readthedocs.io/en/latest/aurora_modules/

If you are in need of a module that is not installed, please check this list:

https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs

If the software you need is in the list but NOT in Aurora, report to Florido, and he will coordinate with Lunarc to provide such software.

If that module does not exist in the system nor in the above list, you will have to build it and configure yourself. Read more about it in the Custom software paragraph.

Special software for Particle Physics users

These features are only available on the hep partition. Other users of such partition can use these if they want.

CVMFS

Particle Physics users might want to use CVMFS to configure their own research software. This system is now available on Aurora nodes and is the recommended way to do CERN-related analysis on the cluster. This will make your code and init scripts the same on almost every cluster who gives access to particle physicist in the whole SNIC and at any cluster at CERN, so learning how to do this on Aurora is a good experience.

cvmfs is a networked filesystem that hosts CERN software. The mount path is /cvmfs/. To initialize the environment, run the following lines of code:

export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase"
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh

:!: At the moment there are some problems in enabling cvmfs scripts to be enabled at every login. If you add the above lines to to your .bash_profile file in your home folder, interactive access to the cluster will NOT work, and maybe even submission will be broken. I don't know the reason for this and it will be investigated. It is therefore NOT RECOMMENDED to add those lines in your .bash profile. Run them after you login to an interactive session or add them in the prologue of your batch job.

Singularity

To be documented

Custom software

Custom software can be installed in the locations listed below. It is up to the user community to develop scripts to configure the environment.

Once the software is built and configurable we can consider creating our own modules to enable it. Ask Florido to help you in such development, and these modules will be shared on Aurora using the same mechanism as other software (module spider <yoursoftware>).

Division folder path
Nuclear Physics /projects/hep/nobackup/software/np
Particle Physics /projects/hep/nobackup/software/pp
Theoretical Physics /projects/hep/nobackup/software/tp
Mathematical Phyisics Please use your home folder for now. We are negotiating a 10GB project space.
aurora_cluster/running_on_aurora.1481019547.txt.gz · Last modified: 2016/12/06 11:19 by florido