How to Limit the Number of Slots for Parallel Jobs Depending on the Selected Parallel Environment

Assume you have a 128 slots Grid Engine cluster and two different parallel environments (PE's) that are named mpi_small and mpi_large.

> qconf -sp mpi_small
pe_name mpi_small
slots 128
…

> qconf -sp mpi_large
pe_name mpi_small
slots 128
...

How is it possible to define a slot limit per job depending on the PE that is chosen when the overall number of slots for each PE type should not be limited. Let's say you want to limit mpi_small jobs to a maximum of 32 slots per job and mpi_large jobs to 64 slots per job. How can this be achieved?

PE Selection is Done During Time of Job Submission

When the PE selection happens during submission by specifying a specific PE then this can be realized with a job submission verifier (JSV). The JSV has in this case to adjust the maximum amount of requested slots depending on the selected PE.

#!/bin/sh

jsv_on_start() {
   return
}

jsv_on_verify() {
   do_correct=0
   new_max=0

   pe_name=`jsv_get_param "pe_name"`
   case "$pe_name" in
       mpi_small)
          new_max=32
          do_correct=1
          ;;
       mpi_large)
          new_max=64
          do_correct=1
          ;;
   esac

   if [ $do_correct -eq 1 ]; then
      jsv_set_param "pe_max" $new_max
      jsv_log_info "pe_max was changed to $new_max because it requested PE $pe_name"
      jsv_correct "Job was modified"
   else
      jsv_accept "Job accepted"
   fi
   return
}

. ${SGE_ROOT}/util/resources/jsv/jsv_include.sh
jsv_main

This is the result when you submit mpi_small and mpi_large PE jobs:

> qsub -jsv ~/jsv.sh -pe mpi_small 1-128  …
pe_max was changed to 32 because it requested PE mpi_small
Your job 89 (“job_name") has been submitted

> qsub -jsv ~/jsv.sh -pe mpi_large 1-128  …
pe_max was changed to 64 because it requested PE mpi_large
Your job 90 (“job_name") has been submitted

PE Selection is Done During Scheduling

If it should be allowed to use wildcard PE's (qsub -pe mpi* ...) during the submission then this might get more complicated. In this case a JSV would fail because during the submission it is unknown which PE will later on be used to schedule the job.

Now each PE type might be split up in multiple PE's where each PE uses the limit value as the maximum of available PE slots.

> qconf -sp mpi_small_1
pe_name mpi_small_1
slots 32
…

> qconf -sp mpi_small_2
pe_name mpi_small_2
slots 32
...

> qconf -sp mpi_small_3
pe_name mpi_small_3
slots 32
...

> qconf -sp mpi_small_4
pe_name mpi_small_4
slots 32
...
Also the mpi_large PE needs to be replaced by following PE’s
> qconf -sp mpi_large_1
pe_name mpi_large_1
slots 64
…

> qconf -sp mpi_large_2
pe_name mpi_large_2
slots 64
...

When now wildcard PE’s are used then the Grid Engine scheduler will select one of the specified PE’s and no job can get more slots as the defined maximum in this PE.

>qsub -pe mpi* 1-128 ...
Your job 91 (“job_name") has been submitted

Depending on the remaining cluster setup this solution might cause some kind of slot fragmentation. Independent if this solution is helpful or not, add your comments below…

By the way: In contrast to other available Grid Engine versions there were some issues fixed in Univa Grid Engine that appear in combination with parallel jobs. My colleague Daniel has described at least one of this issues in his blog Grid Engine Unleashed. There was also one complicated bug that appeared when the -masterq switch was used in combination with the -q switch during submission of PE jobs. Let’s see if Daniel or me will be faster to explain this bug in more detail.

blog comments powered by Disqus