Integration of Job Classes into the Existing System

Extensive use of job classes will have a positive impact on the cluster throughput in Univa Grid Engine 8.1 clusters. Reason for this is that job classes have been fully integrated into the core components of the system. For instance, the scheduling component can distinguish the different types of workloads easily. Also the algorithm in the scheduler that is responsible to find resources for a job was improved. Details are explained below.

Resources Available for Job Classes

The profile of a job is defined by the resource requirements and other job attributes. Queues and host objects define possible execution environments where jobs can be executed. When a job is eligible for execution then the scheduler component of the Univa Grid Engine system tries to find the execution environment that fits best according to all job specific attributes and the configured policies so that this job can be executed.

This decision making process can be difficult and time consuming especially when certain jobs having special resource requirements should only be allowed to run in a subset of the available execution environments. The use of job classes might help here because job classes will give the scheduler additional information on which execution environments will or will not fit for a job. The need to evaluate all the details about available resources of an execution environment and about the job's requirements will be reduced or can be completely eliminated during the decision making process.

This is achieved by an additional parameter in the queue configuration which provides a direct association between queues and one or multiple job classes. This parameter is called jc_list and might be set to the value NONE or a list of job classes or job class variant names. If a list of names is specified then the special keyword ANY_JC and/or NO_JC might be used within the list to filter all those jobs that are in principle allowed to run in this queues. The following combinations are useful:

  • ANY_JC: Jobs may enter the queue that were derived from a job class.
  • NO_JC: Only jobs may enter the queue that were not derived from a job class.
  • ANY_JC, NO_JC:Any job, independent if it was derived from a job class or not, may be executed in the queue. This is the default for any queue that is created in a cluster.
  • <list_of_JC_names>: Only those jobs may get scheduled in the queue if they were derived from one of the enlisted job classes.
  • NO_JC, <list_of_JC_names>: Only those jobs that were not derived from a job class or those that were derived from one of the enlisted job classes can be executed here.

Defining Job Class Limits

Resource quota sets can be defined to influence the resource selection in the scheduler. The jcs filter within a resource quota rule may contain a comma separated list of job class names. This parameter filters for jobs requesting a job class in the list. Any job class not in the list will not be considered for the resource quota rule. If no jcs filter is used, all job classes and jobs with no job class specification match the rule. To exclude a job class from the rule, the name can be prefixed with the exclamation mark (!). '!*' means only jobs with no job class specification.

Example: Resource Quota Set Using a Job Class Filter

   name max_virtual_free_on_lx_hosts_for_app_1_2
   description "quota for virtual_free restriction" 
   enabled true
   limit users {user1,user2} hosts {@lx_host} jcs {app1, app2} to vf=6G 
   limit users {*} hosts {@lx_host} jcs {other_app, !*} to vf=4G

The example above restricts user1 and user2 to 6G virtual_free memory for all jobs derived from of job class app1 or app2 on each Linux host part of the @lx_hosts host group. All users that either do not derive from a job class or request the job class named other_app will have a limit of 4G.

JSV and Job Class Interaction

During the submission of a job multiple Job Submission Verifiers can be involved that verify and possibly correct or reject a job. With conventional job submission (without job classes) each JSV will see the job specification of a job that was specified at the command line via switches and passed parameters or it will see the job parameters that were chosen within the dialogs of the GUI.

When Jobs are derived from a job class then the process of evaluation via JSV scripts is the same but the job parameters that are visible in client JSVs are different. A client JSV will only see the requested job class via a parameter named jc and it will see all those parameters that were specified at the command line. All parameters that are defined in the job class itself cannot be seen.

Job classes will be resolved within the sge_qmaster process as soon as a request is received that tries to submit a job that should be derived from a job class. The following steps are taken (simplified process):

  1. Create a new job structure
  2. Fill job structure with defaults values
  3. Fill job structure with values defined in the job class
    (This might overwrite default values)
  4. Fill job structure with values defined at the command line
    (This might overwrite default values and values that were defined in the job class)
  5. Trigger server JSV to verify and possibly adjust the job
    (This might overwrite default values, JC values and values specified at the command line)
  6. Check if the job structure violates access specifiers

If the server JSV changes the jc parameter of the job in step 5 then the submission process restarts from step 1 using the new job class for step 3.

Please note that the violation of the access specifiers is checked in the last step. As result a server JSV is also not allowed to apply modifications to the job that would violate any access specifiers defined in the job class specification.

blog comments powered by Disqus