User Tools

Site Tools


aurora_cluster:running_on_aurora

Running on the Aurora Cluster

In general, running on Aurora follows the rules stated on Lunarc manual pages. All jobs must be ran under the slurm batch system.

There's only few important things one needs to know:

  • A set of nodes is selected by choosing a partition, a project and a reservation, one for each division. The HEP nodes do not require a special reservation flag to be accessed. The partitions, project names and reservation flags are listed in the table below:
Your division SLURM Partition Project String Reservation String call srun/sbatch with Nodes
Nuclear Physics hep HEP2016-1-3 not needed
srun -p hep -A HEP2016-1-3 <scriptname>
au[193-216]
Particle Physics hep HEP2016-1-4 not needed
srun -p hep -A HEP2016-1-4 <scriptname>
Theoretical Physics hep HEP2016-1-5 not needed
srun -p hep -A HEP2016-1-5 <scriptname> 
Particle Physics - LDMX lu lu2021-2-100 not needed
srun -p lu -A ''lu2021-2-100'' <scriptname>
any available on the LU partition
Mathematical Physics lu lu2021-2-125 lu2021-2-125
srun -p lu -A lu2021-2-125 --reservation=lu2021-2-125 <scriptname> 
mn[01-10],mn[15-20]
Mathematical Physics, select only skylake machines lu lu2021-2-125 lu2021-2-125
srun -C skylake -p lu -A lu2021-2-125 --reservation=lu2021-2-125 <scriptname> 
mn15-20
  • Home folders are backed up by Lunarc.
  • HEP storage is a bit different from the others and there are a few additional rules to use it. Please read storage

Batch Scripts Examples

hep partition (Nuclear, Particle and Theoretical Phyisics)

A typical direct submission to hep nodes looks like:

srun -p hep -A HEP2016-1-4 myscript.sh

Here is an example of a typical slurm submission script slurmexample.sh written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:

#!/bin/bash
#
#SBATCH -A hep2016-1-4
#SBATCH -p hep 
#
hostname
srun echo $BASHPID;

The script is submitted to the SLURM batch queue with the command:

sbatch slurmexample.sh

The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.

lu partition (Mathematical Physics)

A typical direct submission to lu nodes looks like:

srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 myscript.sh

Here is an example of a typical slurm submission script slurmexample.sh written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:

#!/bin/bash
#
#SBATCH -A lu2016-2-10
#SBATCH -p lu
#SBATCH --reservation=lu2016-2-10
#
hostname
srun echo $BASHPID;

The script is submitted to the SLURM batch queue with the command:

sbatch slurmexample.sh

The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.

Since 2018/09/27 there are new nodes mn[15-20] using the skylake chipset/microcode. One can select just these cpus by using the -C flag.

sbatch -C skylake slurmexample.sh

For best performance one should recompile the code for these machines, meaning one needs to tell the compiler that skylake optimization is required. How to do this varies depending on compilers. See https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)#Compiler_support

For a discussion on the benefits in matrix calculus see: https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/bench2018.pdf

Interactive access to nodes for code testing

As said before, it is not possible to run your test code on the frontend on Aurora. But since everybody likes to test their code before submitting it to the batch system, slurm provides a nice way of using a node allocation as an interactive session, just like we were doing on pptest-iridium and nptest-iridium.

The interactive session is activated using the interactive command and a few options:

interactive -t 00:60:00 -p hep -A HEP2016-1-4
interactive -t 00:60:00 -p lu -A lu2016-2-10 --reservation=lu2016-2-10

where -t 00:60:00 is the time in hours:minutes:seconds you want the interactive session to last. You can put as much as you want in the timer. Mind that whatever you're running will be killed after the specified time.

slurm will select a free node for you and open a bash terminal. From that moment on you can pretty much do the same as you were doing on Iridium testing nodes.

The interactive session is terminated when you issue the exit command.

Loading libraries and tools on Aurora

Aurora allows the user to load specific versions of compilers and tools using the excellent module system. This system configures binary tools and library paths for you in a very flexible way.

Read more about it in Lunarc documentation here: http://lunarc-documentation.readthedocs.io/en/latest/aurora_modules/

If you are in need of a module that is not installed, please check this list:

https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs

If the software you need is in the list but NOT in Aurora, report to Florido, and he will coordinate with Lunarc to provide such software.

If that module does not exist in the system nor in the above list, you will have to build it and configure yourself. Read more about it in the Custom software paragraph.

Special software for Particle Physics users

These features are only available on the hep partition. Other users of such partition can use these if they want.

CVMFS

Particle Physics users might want to use CVMFS to configure their own research software. This system is now available on Aurora nodes and is the recommended way to do CERN-related analysis on the cluster. This will make your code and init scripts the same on almost every cluster who gives access to particle physicist in the whole SNIC and at any cluster at CERN, so learning how to do this on Aurora is a good experience.

cvmfs is a networked filesystem that hosts CERN software. The mount path is /cvmfs/. To initialize the environment, run the following lines of code:

export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase"
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh

:!: At the moment there are some problems in enabling cvmfs scripts to be enabled at every login. If you add the above lines to to your .bash_profile file in your home folder, interactive access to the cluster will NOT work, and maybe even submission will be broken. I don't know the reason for this and it will be investigated. It is therefore NOT RECOMMENDED to add those lines in your .bash profile. Run them after you login to an interactive session or add them in the prologue of your batch job.

Singularity

The old singularity version (1.0) is going to being removed on week 8, year 2017 because of a security issue. Please update your scripts to enable singularity using these commands:

module load GCC/4.9.3-2.25
module load Singularity/2.2.1

It is recommended to always use the version number when loading the module to prevent issues using different versions. If you don’t use that and the default module changes, you will run an unwanted version.

Custom software

Custom software can be installed in the locations listed below. It is up to the user community to develop scripts to configure the environment.

Once the software is built and configurable we can consider creating our own modules to enable it. Ask Florido to help you in such development, and these modules will be shared on Aurora using the same mechanism as other software (module spider <yoursoftware>).

Division folder path
Nuclear Physics /projects/hep/nobackup/software/np
Particle Physics /projects/hep/nobackup/software/pp
Theoretical Physics /projects/hep/nobackup/software/tp
Mathematical Phyisics Please use your home folder for now. We are negotiating a 10GB project space.
aurora_cluster/running_on_aurora.txt · Last modified: 2022/04/19 14:42 by florido

Accessibility Statement