User Tools

Site Tools


aurora_cluster:how_scheduling_works

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
aurora_cluster:how_scheduling_works [2019/03/18 09:06]
florido [Considerations on interactive sessions]
aurora_cluster:how_scheduling_works [2019/03/18 09:18]
florido [Considerations on interactive sessions]
Line 19: Line 19:
 If the cluster is busy, requesting an interactive session may take time and fail. The scheduler will happily schedule resources for a user, but if the user asks for an interactive session with say 6 cores and there is no machine with 6 cores free, the scheduler cannot fulfil the request at the moment. ​ If the cluster is busy, requesting an interactive session may take time and fail. The scheduler will happily schedule resources for a user, but if the user asks for an interactive session with say 6 cores and there is no machine with 6 cores free, the scheduler cannot fulfil the request at the moment. ​
  
-The scheduler treats an interactive job the same as a batch job, queueing it with the FIFO strategy described above. However, an exception to the FIFO scheduling appears when a parallel job is waiting for other jobs to finish and release resources. Then a short job can be promoted ahead of the queue, if it fits into an empty slot that is reserved for later use by the parallel job, so called backfill. Thus, queueing interactive jobs with shorter wall times have a higher probability of starting earlier.+The scheduler treats an interactive job the same as a batch job, queueing it with the FIFO strategy described above. However, an exception to the FIFO scheduling appears when a parallel job is waiting for other jobs to finish and release resources. Then a short job can be promoted ahead of the queue, if it fits into an empty slot that is reserved for later use by the parallel job, so called ​//backfill// 
 + 
 +Thus, queueing interactive jobs with shorter wall times have a higher probability of starting earlier.
  
  
 ===== Fairness among projects running on the hep partition ===== ===== Fairness among projects running on the hep partition =====
  
-Fairness is enforced ​among the three projects using the hep partition ​ (HEP 2016/1-3, HEP 2016/1-4, HEP 2016/​1-5) ​that means every project is allocated 1/3 of computing power (core/hoursper month. ​Once one project exceeds that 1/3it will be harder for members ​of that project to get resources when the other projects are running, because there is a //debt// of computing power towards them.  +Fairness is maintained ​among the three projects using the hep partition (HEP 2016/1-3, HEP 2016/1-4, HEP 2016/​1-5) ​by each being allocated 1/3 of the computing power (calculated as core hours per month). This is the basis for fair share; i.e., the priority ​of jobs from one project ​is calculated with respect ​to how much of the target 1/3 has been used by the project over the last 30 daysproject ​that has used large portion ​of the allocated ​time (or more than the allocation) will have a lower priority than a project that has used a small part.
- +
-This happens only when the cluster is being used intensively ​by all, which is quite rare at the moment of writing. +
- +
-But if at some point each project ​is using considerable amount ​of computing power, it is for sure that all project members will have to wait in the queue to be allocated. Remember that the **allocation** is **what you ASK for** in the sbatch script: once is allocated is yours and others cannot take it.+
  
 +:!: If more memory per core is used than the total memory of the node divided by the number of cores of the node, this will be **equivalent to using more cores** in the calculation of usage.
 ===== Suggestion to self-regulate the usage inside a project ===== ===== Suggestion to self-regulate the usage inside a project =====
  
   * The project members should interact on a regular basis to understand what are their expected computing needs;   * The project members should interact on a regular basis to understand what are their expected computing needs;
-  * Those needs should be translated into *expected resource ​allocation ​requests* +  * Those negotiated ​needs should be translated into **expected resource requests*
-  * These resource allocation ​requests should be documented somewhere FIXME on the cluster  +  * These requests should be documented somewhere FIXME on the cluster, like in a file. 
-  * All users should ​use the suggested allocation request ​from the file when submitting +  * All users should ​honour ​the expected resource requests ​from the above file when submitting 
-  * In order to preserve the possibility to use all of the nodes when needed, these allocations ​should be flexible enough to be changed on the fly according to needs of the members of the project +  * In order to preserve the possibility to use all of the nodes when needed, these requests ​should be flexible enough to be changed on the fly according to needs of the members of the project
  
aurora_cluster/how_scheduling_works.txt · Last modified: 2019/08/27 10:28 by florido

Accessibility Statement