How to Define Application Specific Resource Quotas

What steps were necessary in your Grid Engine installation to define application specific resource limits? Was it necessary to add plenty of additional consumable complexes or queues? Was it necessary to add queues, parallel environment objects or projects to be able to define resource quota rules? When you answer some of those questions with 'yes' and when you also observed a decreasing cluster throughput then continue reading.

The different open source versions of Grid Engine and also SGE/OGE provide the possibility to define resource quotas with filter rules for users, projects, hosts/hostgroups, queues and parallel environments. There is no filter to select the application type that will be executed by a job.

Different application types can only be distinguished if the corresponding jobs have characteristics that can be selected with existing filter rules. If there are no differences or when the differences are not known during the time when a resource quota rule should be defined then this might make things more complicated.

Depending on the cluster setup it might be necessary to define additional queues, hosts, projects or parallel environments that would not be required if the underlaying Grid Engine version would be able to distinguish the executed application type automatically. Those additional objects complicate the overall cluster setup, job submission process and they also make the work for the Grid Engine scheduler component more difficult and time consuming with the potential risk to decrease the cluster throughput.

Users of Univa Grid Engine 8.1 can use job classes to submit jobs into the Grid Engine system. One advantage of using job classes is it to simplify the submission for users but job classes can also be seen as objects that define the application type for jobs. Users that use job classes to submit new jobs into a cluster implicitly provide the application type of the job. There is no need to specify synthetical pseudo objects.

Resource quota rules in Univa Grid Engine allow to select jobs by job class names. Although it is possible to combine job class filters with all other filters that also exist in SGE/OGE and Open Source Grid Engine versions this is not a requirement. Due to that pure job class rules can be used to define limits for application types. There is no need to increase the overhead for the Grid Engine system and such rules also remain valid when the cluster setup changes.

Example: Assume following problem scenario

  • Multiple different application types should run in a UGE 8.1 cluster.
  • One application type (class_a job) has specific hardware and filesystem requirements and those jobs can therefore only be executed on hosts part of the @special_hosts host group. All hosts referenced in this host group have access to a filesystem that is required by the jobs to access input data and to write compute results.
  • The fileserver that can be accessed by @special_hosts hosts can handle ~600 class_a jobs in parallel efficiently. Running more class_a jobs in parallel will influence the runtime negatively.
How is it possible to limit the maximum number of class_a jobs in the cluster so that the fileserver will not get overloaded? The creation of additional consumable resources, queues, parallel environments or projects should be avoided.

To solve this problem it might make sense to create a job class for class_a jobs (if it does not already exist):

$ qconf -sjc class_a
jc_name class_a
l_hard q=*@@special_hosts
...

A resource quota could then look like this to define the limit.

$ qconf -srqs max_jc_slots
{
   name         max_jc_slots
   description  "slots resource quota for application types"
   enabled      TRUE
   limit        jcs {class_a.default} to slots=600
}
 

As you can see this resource quota is independent from any defined queues, hosts or other objects and there is no need to add such objects or additional consumable resources to the cluster to enable the scheduler to distinguish class_a jobs from other jobs that might exist in the cluster.

Do you want to give it a try? Trial version of Univa Grid Engine is available here. This version has all features, bug fixes and enhancements developed by Univa but it is limited to 48 cores.

blog comments powered by Disqus