How to Schedule GPU Resources

Assume following scenario: You have a cluster of machines where some machines have a number of GPU cards attached. There are jobs that require one or multiple GPU cards for their calculation. Performance of each GPU is optimal when it is used by only one job at one point it time. How is it possible to setup a Grid Engine cluster so that each job is told which GPU(s) it should take?

Univa Grid Engine 8.1 provides a solution for this scenario through a new complex type called RSMAP. Lets make an example:

A new resource named gpu_card can be added to the cluster configuration.

> qconf -mc
#name     shortcut   type  relop requestable consumable default  urgency 
#------------------------------------------------------------------------
gpu_card  gpu        RSMAP <=    YES         YES        0        0
…

Now it is possible to setup the number of available GPU’s for each host in the cluster.

> qconf -me execution_host
hostname              execution_host
complex_values        gpu_card=4(gpu1 gpu2 gpu3 gpu4)
...

Line 3 above tells Univa Grid Engine that execution_host has four GPU’s attached and it also defines unique identifiers for each GPU. The number of GPU resources and the number of corresponding GPU identifiers can be different for different hosts.

The qhost command will now show the availability of the GPU resources.

> qhost -F gpu
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  ...
--------------------------------------------------------------
global                  -               -    -    -    -     -  ...
execution_host          lx-amd64        1    1    1    1  0.08  ...
    Host Resource(s):      hc:gpu_card=4.000000
...

Similar to consumable resources in Univa Grid Engine it is now possible to request the amount of GPU resources that are required for the job execution. qstat -j will show which resources have been assigned to a job.

> qsub -l gpu=2 …
Your job 207 ("job_name") has been submitted 

> qsub -l gpu=1 …
Your job 208 ("job_name") has been submitted 

> qstat -j 207
resource map          1:    gpu_card=execution_host=(gpu1 gpu2)
...

> qstat -j 208
resource map          1:    gpu_card=execution_host=(gpu3)
...

The GPU identifiers that are assigned to the jobs 207 and 208 are different. Within the job it is possible to retrieve the assigned GPU identifiers through the environment variable SGE_HGR_gpu_card. Within the job script of job 207 you will find following variable.

> env | grep SGE_HGR
SGE_HGR_gpu_card=gpu1 gpu2

To be able to use the RSMAP type also for other things the RSMAP identifiers are interpreted as simple strings that can be seen as a tag for a resource. To ease the administration Univa Grid Engine provides a shortcut to defining RSMAP identifiers. When you specify a number range like 7-11 then this range will be automatically expanded to the strings 7, 8, 9, 10 and 11. A RSMAP resource like res=98(0 5-100 INF) is a nice shortcut to define the resource identifiers 0; 5, 6, 7, … to 100 and INF. This reduces typing effort.

It is also possible to define RSMAP identifiers multiple times like: res=7(0 1 1 2 2 2 3 3 3). The Univa Grid Engine scheduler tries to select and assign resource identifiers from left to the right. The first job that tries to get three res resources will get the resource identifiers 0, 1 and 1 if they are available.

Are there any questions or comments - just enter them below.

blog comments powered by Disqus