User Tools

Site Tools


aurora_cluster:running_on_aurora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
aurora_cluster:running_on_aurora [2016/12/06 10:05]
florido [Custom software]
aurora_cluster:running_on_aurora [2018/09/27 12:56] (current)
florido [lu partition (Mathematical Physics)]
Line 6: Line 6:
 There'​s only few important things one needs to know: There'​s only few important things one needs to know:
  
-  ​* HEP nodes are running in a special **partition** (a.k.a. queue) called ''​hep''​. Whenever in the documentation you're asked to specify a queue, use ''​hep''​ +  * A set of nodes is selected by choosing ​a **partition**, ​a **project** and a **reservation**,​ one for each division. The HEP nodes do not require a special reservation flag to be accessed. The partitions, ​project names and reservation flags are listed in the table below:
-  ​* A set of nodes is selected by choosing a **project** and a **reservation**,​ one for each division. The HEP nodes do not require a special reservation flag to be accessed. The project names and reservation flags are listed in the table below:+
  
-^ Your division ^ SLURM Partition ^ Project String ^ Reservation String ^ call srun with ^ +^ Your division ^ SLURM Partition ^ Project String ^ Reservation String ^ call srun/​sbatch ​with ^ Nodes 
-| Nuclear Physics | ''​hep''​ | ''​HEP2016-1-3''​ | not needed | <​code:​bash>​srun -p hep -A HEP2016-1-3 <​scriptname></​code>​ | +| Nuclear Physics | ''​hep''​ | ''​HEP2016-1-3''​ | not needed | <​code:​bash>​srun -p hep -A HEP2016-1-3 <​scriptname></​code> ​| ''​au[193-216]'' ​
-| Particle Physics | ''​hep''​ | ''​HEP2016-1-4''​ | not needed | <​code:​bash>​srun -p hep -A HEP2016-1-4 <​scriptname></​code>​ | +| Particle Physics | ''​hep''​ | ''​HEP2016-1-4''​ | not needed | <​code:​bash>​srun -p hep -A HEP2016-1-4 <​scriptname></​code> ​| ::: 
-| Theoretical Physics | ''​hep''​ | ''​HEP2016-1-5''​ | not needed | <​code:​bash>​srun -p hep -A HEP2016-1-5 <​scriptname>​ </​code>​ | +| Theoretical Physics | ''​hep''​ | ''​HEP2016-1-5''​ | not needed | <​code:​bash>​srun -p hep -A HEP2016-1-5 <​scriptname>​ </​code> ​| ::: 
-| Mathematical Physics | ''​lu''​ | ''​lu2016-2-10''​ | ''​lu2016-2-10''​ | <​code:​bash>​srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 <​scriptname>​ </​code>​ |+| Mathematical Physics | ''​lu''​ | ''​lu2016-2-10''​ | ''​lu2016-2-10''​ | <​code:​bash>​srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 <​scriptname>​ </​code> ​| ''​mn[01-10],​mn[15-20]''​ | 
 +| Mathematical Physics, **select only skylake machines** | ''​lu''​ | ''​lu2016-2-10''​ | ''​lu2016-2-10''​ | <​code:​bash>​srun -C skylake -p lu -A lu2016-2-10 --reservation=lu2016-2-10 <​scriptname>​ </​code>​ | ''​mn15-20'' ​|
  
   * Home folders are backed up by Lunarc.   * Home folders are backed up by Lunarc.
Line 19: Line 19:
   * HEP storage is a bit different from the others and there are a few additional rules to use it. Please read [[aurora_cluster:​storage]]   * HEP storage is a bit different from the others and there are a few additional rules to use it. Please read [[aurora_cluster:​storage]]
  
-A typical direct submission to our nodes looks like:+  * Basic usage of the batch system is described here: 
 +    * Using the SLURM batch system: http://​lunarc-documentation.readthedocs.io/​en/​latest/​batch_system/​ 
 +    * Batch system rules. Note: these rules might be slightly different for us since we have our own partition. http://​lunarc-documentation.readthedocs.io/​en/​latest/​batch_system_rules/​ 
 + 
 +===== Batch Scripts Examples ===== 
 + 
 +==== hep partition (Nuclear, Particle and Theoretical Phyisics) ==== 
 + 
 +A typical direct submission to ''​hep'' ​nodes looks like:
 <code bash>​srun -p hep -A HEP2016-1-4 myscript.sh</​code>​ <code bash>​srun -p hep -A HEP2016-1-4 myscript.sh</​code>​
  
Line 34: Line 42:
 </​code>​ </​code>​
  
-to be run with the command:+The script is submitted  ​to the SLURM batch queue with the command:
 <code bash>​sbatch slurmexample.sh</​code>​ <code bash>​sbatch slurmexample.sh</​code>​
  
-Basic usage of the batch system ​is described here: +The results will be found in the folder where the above command ​is ran, in a file named after the slurm job ID.
-  * Using the SLURM batch system: http://​lunarc-documentation.readthedocs.io/​en/​latest/​batch_system/​ +
-  * Batch system rules. Note: these rules might be slightly different for us since we have our own partition. http://​lunarc-documentation.readthedocs.io/​en/​latest/​batch_system_rules/​+
  
 +==== lu partition (Mathematical Physics) ====
 +
 +A typical direct submission to ''​lu''​ nodes looks like:
 +<code bash>​srun -p lu -A lu2016-2-10 --reservation=lu2016-2-10 myscript.sh</​code>​
 +
 +Here is an example of a typical slurm submission script ''​slurmexample.sh''​ written in bash, that prints the hostname of the node where the job is executed and the PID of the bash process running the script. It will have this prologue:
 +
 +<code bash>
 +#!/bin/bash
 +#
 +#SBATCH -A lu2016-2-10
 +#SBATCH -p lu
 +#SBATCH --reservation=lu2016-2-10
 +#
 +hostname
 +srun echo $BASHPID;
 +</​code>​
 +
 +The script is submitted ​ to the SLURM batch queue with the command:
 +<code bash>​sbatch slurmexample.sh</​code>​
 +
 +The results will be found in the folder where the above command is ran, in a file named after the slurm job ID.
 +
 +Since 2018/09/27 there are new nodes ''​mn[15-20]''​ using the //skylake// chipset/​microcode. One can select just these cpus by using the -C flag.
 +<code bash>​sbatch -C skylake slurmexample.sh</​code>​
 +
 +For best performance one should recompile the code for these machines, meaning one needs to tell the compiler that skylake optimization is required. How to do this varies depending on compilers. See [[https://​en.wikichip.org/​wiki/​intel/​microarchitectures/​skylake_(server)#​Compiler_support]]
 +
 +For a discussion on the benefits in matrix calculus see: https://​cfwebprod.sandia.gov/​cfdocs/​CompResearch/​docs/​bench2018.pdf
  
 ===== Interactive access to nodes for code testing ===== ===== Interactive access to nodes for code testing =====
Line 50: Line 85:
  
 <code bash> <code bash>
-interactive -t 60 -p hep -A HEP2016-1-1+interactive -t 00:60:00 -p hep -A HEP2016-1-
 +</​code>​ 
 +<code bash> 
 +interactive -t 00:60:00 -p lu -A lu2016-2-10 --reservation=lu2016-2-10
 </​code>​ </​code>​
  
-where ''​-t 60''​ is the time in minutes you want the interactive session to last. You can put as much as you want in the timer. Mind that whatever you're running will be killed after the specified time.+where ''​-t ​00:60:00''​ is the time in hours:minutes:​seconds ​you want the interactive session to last. You can put as much as you want in the timer. Mind that whatever you're running will be killed after the specified time.
  
 //slurm// will select a free node for you and open a bash terminal. From that moment on you can pretty much do the same as you were doing on Iridium testing nodes. //slurm// will select a free node for you and open a bash terminal. From that moment on you can pretty much do the same as you were doing on Iridium testing nodes.
Line 99: Line 137:
 ==== Singularity ==== ==== Singularity ====
  
-To be documented+The old singularity version (1.0) is going to being removed on week 8, year 2017 
 +because of a security issue. Please update your scripts to enable singularity using these commands: 
 + 
 +<​code:​bash>​ 
 +module load GCC/​4.9.3-2.25 
 +module load Singularity/​2.2.1 
 +</​code>​ 
 + 
 +It is recommended to **always use the version number** when loading the 
 +module to prevent issues using different versions. If you don’t use that 
 +and the default module changes, you will run an unwanted version. 
 +  
  
 ===== Custom software ===== ===== Custom software =====
aurora_cluster/running_on_aurora.1481018746.txt.gz · Last modified: 2016/12/06 10:05 by florido