Skip to content

Managing Multiple Resource Groups

Using multiple resource groups lets you define rules for the most economical use of your computing resources.

Consider the following scenarios:

  • Suppose that one resource group, "local", contains three local machines and another group, "cloud" contains a machine in the cloud.

    Your priority is to first use the local machines and when needed, to utilize the machine in the cloud. You assign the "local" resource group a high priority, let's say 100; you assign the "cloud" group a lower priority, 50.

  • Let's suppose you define two resource groups, both of which use p2 instances in the AWS cloud.

    • One resource group has an on-demand price. For example, a p2.8xlarge instance costs $7.2 per hour.
    • Another resource group also uses p2.8xlarge instances, but on a spot basis.

    To be cost-effective, it makes sense to be charged spot prices, and only when necessary, to use the on-demand option. To achieve this, you might set the priority of the first group to 50 and the priority of the second to 100.

Criteria for priority job execution

Jobs that you run go into a queue. Before running the job, the MissingLink Scheduler chooses the best resource group by going over all the definitions of resource groups and sorting them according to priority. The Scheduler tries to assign a resource with the highest priority as long as it meets two conditions:

  • The resource can provide the service that is required.
  • In the resource group, there are resources that are available, meaning capacity is still not fully utilized. For example, if the capacity for a certain resource group with the highest priority is defined as 5 and currently there are three machines being used, then the job can be run.

When the Scheduler does not find a resource group that meets the criteria, the job is queued until capacity allows execution.

You can assign additional resources with a single command!

Suppose you have ten jobs that are in the job queue and only five machines are available. The five latest jobs will wait until the machines become available. You can assign additional capacity for handling the jobs in the queue more efficiently by adjusting a single parameter:

ml resources group "my_resource_group" --update --set group_capacity 10

increases the capacity to 10.