Limits

Overview

The multi-factor priority plugin in flux-accounting not only performs validation for associations and priority calculation for jobs; it can also administer and enforce limits on a per-association and per-queue basis. Limits are another way to ensure fair behavior of systems that see a lot of simultaneous activity from a large number of users.

Hard Limits vs. Soft Limits

There are two different kinds of limits in flux-accounting: hard limits and soft limits. Hard limits will prevent an association from submitting a job altogether and will report a message as to why the job cannot proceed past validation. Soft limits will allow a job to be submitted but will prevent it from running until a prerequisite has been met.

There is just one hard limit in flux-accounting:

(per-association) max active jobs

The max number of active jobs an association can have at any given time.

The soft limits in flux-accounting are composed of:

(per-association) max running jobs

The max number of running jobs an association can have at any given time.

(per-association) max resources

The max number of resources (total cores + total nodes) an association can have across their running jobs at any given time.

(per-queue) max running jobs

The max number of running jobs an association can have in a given queue at any given time.

(per-queue) max nodes

The max number of nodes an association can have across their running jobs in a givent queue at any given time.

Note

For more details on the difference between an active job and a running job, see the virtual states section of RFC 21.

An example

The difference between hard and soft limits might be best described by example. Let's configure an association to have a limit configuration of at most 1 running job and 2 active jobs:

$ flux account add-user --username=buster --bank=giants --max-running-jobs=1 --max-active-jobs=2

buster can submit a job and the priority plugin will generate a priority for this job and pass it on to the scheduler to begin running. If buster submits a second job while the first one is running, this second job will be held until the first job completes. Specifically, a dependency will be added to the job to hold it in the DEPEND state until the first job has transitioned to the INACTIVE state. The name of the dependency can be found in the job's eventlog:

$ flux job eventlog JOBID
dependency-add description="max-resources-user-limit"

After the first job has transitioned to INACTIVE, the dependency will be removed and the job can proceed to have its priority calculated and move on to the scheduler to be run:

dependency-remove description="max-resources-user-limit"
depend
priority priority=50000
alloc annotations={"sched":{"resource_summary":"rank0/core[0-1]"}}

If buster submits a third job while the first job is still running and the second job is waiting in DEPEND, it will be rejected due to the association's max active jobs limit:

$ flux submit my_job
flux-job: user has max active jobs

These limits can be configured and modified after an association or a queue has been created in flux-accounting with the edit-user and edit-queue commands. After modifying limits for either an association or a queue, be sure to update the priority plugin with the new data written to the flux-accounting database:

$ flux account-priority-update

How flux-accounting dependencies are removed from a job

When an association's currently running job finishes running and has transitioned to INACTIVE state, an association's set of held jobs (if any) are iterated through and checked one at a time to see if any or all of them meet the requirements to have their dependencies removed and transition to RUN. The workflow looks like the following: grab the held job, its attributes, and its dependencies. Ensure that the job would not:

  • Put the association over the max running jobs limit for the queue the job is submitted in.

  • Put the association over the max nodes limit for the queue the job is submitted in.

  • Put the association over their max running jobs limit regardless of queue.

  • Put the association over their max resources limit regardless of queue.

The associated dependency is removed from the job as each requirement is met. In other words, a job can have a dependency removed from it while still possessing one or more other dependencies.

FAQ

My job is held with a flux-accounting dependency. How can I figure out why it's held?

The first thing to check is the properties associated with the dependency placed on the job. For example, if the job has a max-run-jobs-queue dependency, look to see:

  1. how many jobs the association already has running in that queue:

$ flux jobs --queue=debug --filter=RUN --user=buster
  1. what the max_running_jobs limit is for that queue:

$ flux account view-queue debug --parsable
queue  | min_nodes_per_job | max_nodes_per_job | max_time_per_job | priority | max_running_jobs | max_nodes_per_assoc
-------+-------------------+-------------------+------------------+----------+------------------+--------------------
debug  | 1                 | 1                 | 60               | 0        | 100              | 1

If more information is required, you may also want to check the limits configured for the association:

$ flux account view-user buster
username | userid | max_running_jobs | max_active_jobs | max_nodes | max_cores | queues
---------+--------+------------------+-----------------+-----------+-----------+--------
buster   | 28     | 5                | 7               | unlimited | unlimited | debug

Administrators may also be interested in looking at what information the priority plugin currently has for each association, queue, and project with flux jobtap query mf_priority.so. This will return what limits and held jobs each association has according to the priority plugin, organized by user ID.

Do attribute changes to a queue or association take effect immediately?

No. If the limits configured for a particular queue or association do not seem to fit your needs, you can change them. However, be sure to note that these limits need to be pushed to the priority plugin with flux account-priority-update in order for them to take effetc. When the plugin is updated with the new limits, the held jobs for every association are reanalyzed to see if they now fit the requirements to be released.