In general, running on Aurora follows the rules stated on Lunarc manual pages. All jobs must be ran under the slurm batch system.
There's only few important things one needs to know:
Your division | SLURM Partition | Project String | Reservation String | call srun/sbatch with | Nodes |
---|---|---|---|---|---|
Nuclear Physics | hep | HEP2016-1-3 | not needed | srun -p hep -A HEP2016-1-3 <scriptname> | au[193-216] |
Particle Physics | hep | HEP2016-1-4 | not needed | srun -p hep -A HEP2016-1-4 <scriptname> |
|
Theoretical Physics | hep | HEP2016-1-5 | not needed | srun -p hep -A HEP2016-1-5 <scriptname> |
|
Particle Physics - LDMX | lu | lu2021-2-100 | not needed | srun -p lu -A ''lu2021-2-100'' <scriptname> | any available on the LU partition |
Mathematical Physics | lu | lu2021-2-125 | lu2021-2-125 | srun -p lu -A lu2021-2-125 --reservation=lu2021-2-125 <scriptname> | mn[01-10],mn[15-20] |
Mathematical Physics, select only skylake machines | lu | lu2021-2-125 | lu2021-2-125 | srun -C skylake -p lu -A lu2021-2-125 --reservation=lu2021-2-125 <scriptname> | mn15-20 |
A typical direct submission to hep
nodes looks like:
srun -p hep -A HEP2016-1-4 myscript.sh
Here is an example of a typical slurm submission script slurmexample.sh
written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:
#!/bin/bash # #SBATCH -A hep2016-1-4 #SBATCH -p hep # hostname srun echo $BASHPID;
The script is submitted to the SLURM batch queue with the command:
sbatch slurmexample.sh
The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.
A typical direct submission to lu
nodes looks like:
srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 myscript.sh
Here is an example of a typical slurm submission script slurmexample.sh
written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:
#!/bin/bash # #SBATCH -A lu2016-2-10 #SBATCH -p lu #SBATCH --reservation=lu2016-2-10 # hostname srun echo $BASHPID;
The script is submitted to the SLURM batch queue with the command:
sbatch slurmexample.sh
The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.
Since 2018/09/27 there are new nodes mn[15-20]
using the skylake chipset/microcode. One can select just these cpus by using the -C flag.
sbatch -C skylake slurmexample.sh
For best performance one should recompile the code for these machines, meaning one needs to tell the compiler that skylake optimization is required. How to do this varies depending on compilers. See https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)#Compiler_support
For a discussion on the benefits in matrix calculus see: https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/bench2018.pdf
As said before, it is not possible to run your test code on the frontend on Aurora. But since everybody likes to test their code before submitting it to the batch system, slurm provides a nice way of using
a node allocation as an interactive session, just like we were doing on pptest-iridium
and nptest-iridium
.
The interactive session is activated using the interactive
command and a few options:
interactive -t 00:60:00 -p hep -A HEP2016-1-4
interactive -t 00:60:00 -p lu -A lu2016-2-10 --reservation=lu2016-2-10
where -t 00:60:00
is the time in hours:minutes:seconds you want the interactive session to last. You can put as much as you want in the timer. Mind that whatever you're running will be killed after the specified time.
slurm will select a free node for you and open a bash terminal. From that moment on you can pretty much do the same as you were doing on Iridium testing nodes.
The interactive session is terminated when you issue the exit
command.
Aurora allows the user to load specific versions of compilers and tools using the excellent module
system.
This system configures binary tools and library paths for you in a very flexible way.
Read more about it in Lunarc documentation here: http://lunarc-documentation.readthedocs.io/en/latest/aurora_modules/
If you are in need of a module that is not installed, please check this list:
https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs
If the software you need is in the list but NOT in Aurora, report to Florido, and he will coordinate with Lunarc to provide such software.
If that module does not exist in the system nor in the above list, you will have to build it and configure yourself. Read more about it in the Custom software paragraph.
These features are only available on the hep
partition. Other users of such partition can use these if they want.
Particle Physics users might want to use CVMFS to configure their own research software. This system is now available on Aurora nodes and is the recommended way to do CERN-related analysis on the cluster. This will make your code and init scripts the same on almost every cluster who gives access to particle physicist in the whole SNIC and at any cluster at CERN, so learning how to do this on Aurora is a good experience.
cvmfs is a networked filesystem that hosts CERN software. The mount path is /cvmfs/
.
To initialize the environment, run the following lines of code:
export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase" source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
At the moment there are some problems in enabling cvmfs scripts to be enabled at every login. If you add the above lines to to your .bash_profile
file in your home folder, interactive access to the cluster will NOT work, and maybe even submission will be broken. I don't know the reason for this and it will be investigated.
It is therefore NOT RECOMMENDED to add those lines in your .bash profile
.
Run them after you login to an interactive session or add them in the prologue of your batch job.
The old singularity version (1.0) is going to being removed on week 8, year 2017 because of a security issue. Please update your scripts to enable singularity using these commands:
module load GCC/4.9.3-2.25 module load Singularity/2.2.1
It is recommended to always use the version number when loading the module to prevent issues using different versions. If you don’t use that and the default module changes, you will run an unwanted version.
Custom software can be installed in the locations listed below. It is up to the user community to develop scripts to configure the environment.
Once the software is built and configurable we can consider creating our own modules to enable it. Ask Florido to help you in such development, and these modules will be shared on Aurora using the same mechanism as other software (module spider <yoursoftware>
).
Division | folder path |
---|---|
Nuclear Physics | /projects/hep/nobackup/software/np |
Particle Physics | /projects/hep/nobackup/software/pp |
Theoretical Physics | /projects/hep/nobackup/software/tp |
Mathematical Phyisics | Please use your home folder for now. We are negotiating a 10GB project space. |